SetTitle slows down RDataFrame performance


this is probably a feature, but I run into something rather natural for data analysis that slows downs the RDataFrame performance. I attach a test program based on an existing tutorial.

The timing performance is good and behaves as expected with the default program:

Histo1D photon_eta   0.174297094345  [sec]
Histo1D photon_pt   0.00482892990112  [sec]
Histo1D photon_E   0.00204801559448  [sec]
Histo1D photon_ptcone30   0.0020010471344  [sec]
Draw photon_eta   26.5902540684  [sec]
Draw photon_pt   2.00271606445e-05  [sec]
Draw photon_E   5.00679016113e-06  [sec]
Draw photon_ptcone30   3.09944152832e-06  [sec]

However, if I turn on the line


just after defining the histogram model, it seems that RDataFrame loops for each histogram over the tree

Histo1D photon_eta   9.44745612144  [sec]
Histo1D photon_pt   9.23581314087  [sec]
Histo1D photon_E   11.1579310894  [sec]
Histo1D photon_ptcone30   13.5817921162  [sec]
Draw photon_eta   0.0102381706238  [sec]
Draw photon_pt   1.09672546387e-05  [sec]
Draw photon_E   2.86102294922e-06  [sec]
Draw photon_ptcone30   3.09944152832e-06  [sec]

Naively I find surprising that setting a title in the histo model has such an effect.

May be a good warning for (919 Bytes)



ROOT Version: /cvmfs/
Platform: lxplus6

RDF produces the results lazily, the first time you access them. If you access each result as soon as you register its computation with RDF, it will have to run several event loops rather than just one that produces all results at the same time.

As a general rule of thumb, produce all the RDF results you need first, and then use them/access them/call methods on them.

EDIT: note that in this case you can construct the model with the correct title in the first place