Vincenzo will give a clearer and more complete view, but a few quick points, from limited RDataFrame knowledge (I may be wrong in some of this! Hopefully Vincenzo or another expert will correct me if so):
- There is an overhead, and you’ll probably need many more events than 50k to notice differences.
- Also, RDataFrame can run in parallel (with ImplicitMT enabled), but again, you need a lot of data to make it worth it (see, e.g. RDataFrame seems too conservative about spawning new threads)
- Your example has a Take.GetValue, and later a sum.GetValue, which I think is triggering 2 event loops, not the best for RDataFrame performance; lazy and instant actions should be planned carefully.
You can also check out the performace tips in the documentation: ROOT: ROOT::RDataFrame Class Reference