this is probably a feature, but I run into something rather natural for data analysis that slows downs the RDataFrame performance. I attach a test program based on an existing tutorial.
The timing performance is good and behaves as expected with the default program:
Histo1D photon_eta 0.174297094345 [sec] Histo1D photon_pt 0.00482892990112 [sec] Histo1D photon_E 0.00204801559448 [sec] Histo1D photon_ptcone30 0.0020010471344 [sec] Draw photon_eta 26.5902540684 [sec] Draw photon_pt 2.00271606445e-05 [sec] Draw photon_E 5.00679016113e-06 [sec] Draw photon_ptcone30 3.09944152832e-06 [sec]
However, if I turn on the line
just after defining the histogram model, it seems that RDataFrame loops for each histogram over the tree
Histo1D photon_eta 9.44745612144 [sec] Histo1D photon_pt 9.23581314087 [sec] Histo1D photon_E 11.1579310894 [sec] Histo1D photon_ptcone30 13.5817921162 [sec] Draw photon_eta 0.0102381706238 [sec] Draw photon_pt 1.09672546387e-05 [sec] Draw photon_E 2.86102294922e-06 [sec] Draw photon_ptcone30 3.09944152832e-06 [sec] >>>
Naively I find surprising that setting a title in the histo model has such an effect.
May be a good warning for others.test4.py (919 Bytes)
ROOT Version: /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.22.02/x86_64-centos7-gcc48-opt/bin/root