RDataFrame and suitability for rapid iterative passes through datasets

dastudillo · February 19, 2026, 1:56pm

Vincenzo will give a clearer and more complete view, but a few quick points, from limited RDataFrame knowledge (I may be wrong in some of this! Hopefully Vincenzo or another expert will correct me if so):

There is an overhead, and you’ll probably need many more events than 50k to notice differences.
Also, RDataFrame can run in parallel (with ImplicitMT enabled), but again, you need a lot of data to make it worth it (see, e.g. RDataFrame seems too conservative about spawning new threads)
Your example has a Take.GetValue, and later a sum.GetValue, which I think is triggering 2 event loops, not the best for RDataFrame performance; lazy and instant actions should be planned carefully.

You can also check out the performace tips in the documentation: ROOT: ROOT::RDataFrame Class Reference