RDataFrame syntax more like TTree::Draw

eguiraud · November 27, 2018, 9:04am

Hi @mwilkins,
thanks for the feedback!
Indeed we wanted to implement this since even before RDataFrame was part of ROOT!

Then when you get to the details of designing the feature things get messier – no blockers, just annoyances:

if you implement this for 1D histograms, it will be hard to justify why the functionality is not there also for 2D, 3D histograms and all other actions: df.Max("x.size() - y.size()"), df.Snapshot(..., {"x*x","y*y", "x*y*z"})
the proper way to implement this for histograms under the hood is with a function that calculates the quantity on the fly and fills the histogram with it directly, avoiding the cost of the copy and indirection that Define brings with it
the feature is easy to abuse: you don’t want to encourage users to define "myexpensivefunc(x)" in-place everytime they use it – Defineing it once avoids extra computation
the performant way to do this is with lambda functions rather than just-in-time compiled strings: df.Histo1D([](int x) { return x*y; }, {"x", "y"}), but this is so verbose that just using a Define does not seem so bad now…

So…since the functionality is there with just a few more keystrokes, we never got around to implement this. It’s on the bucket list though! And now we know users also feel this would be nice to have.

Cheers,
Enrico