But looking at the example above, it seems to loop over all entries.
I have a big RDataframe with ~ 1e7 entries.
Thus, I am wondering if there is any faster way to do the same.
Thanks for the interesting post.
I think a way in which you can efficiently “pre-filter” your large dataset is to use the Range transformation. Have you tried that?
I consistently get the Filter method faster
Funny, if I test Filter alone with Multithreading on it gets slower
Range method: around 6.19 sec
Filter method: around 4.69 sec
Filter method with ROOT.EnableImplicitMT() on a machine with 20 cores: around 6.7 - 7 sec
Filter method with ROOT.EnableImplicitMT(4) gives the same: around 6.7 - 7 sec
I have run the test 3-5 times, and fluctuations seem to be ±0.1 sec
So, my conclusion so far is to use Filter and disable multithreading. Again, this puzzles me as the most optimal method.
I think the reason is that with Range, you read much less data and jump directly to the region of the dataset you need, avoiding decompressing and other overheads.