Accessing individual entries with RDataFrame

Hi all,

Is there a way to check a single event/entry/raw with RDataFrame?
I want something like this:

df = RDataFrame("tree", "file.root")
my_lovely_entry = df.Filter("rdfentry_ == 42").AsNumpy()

But looking at the example above, it seems to loop over all entries.
I have a big RDataframe with ~ 1e7 entries.
Thus, I am wondering if there is any faster way to do the same.

Hi Bohdan,

Thanks for the interesting post.
I think a way in which you can efficiently “pre-filter” your large dataset is to use the Range transformation. Have you tried that?

I hope that helps.


1 Like

Hi @Danilo,

Now, I have tried that.

# ROOT.EnableImplicitMT()
start_time = time.time()
shower = df.Range(42, 42+1).AsNumpy(myColumns)
print(f"Range method: {time.time() - start_time:.2f} sec")
start_time = time.time()
shower = df.Filter("rdfentry_ == 42").AsNumpy(myColumns)
print(f"Filter method: {time.time() - start_time:.2f} sec")

I consistently get the Filter method faster :smiley:
Funny, if I test Filter alone with Multithreading on it gets slower

Range method: around 6.19 sec
Filter method: around 4.69 sec
Filter method with ROOT.EnableImplicitMT() on a machine with 20 cores: around 6.7 - 7 sec
Filter method with ROOT.EnableImplicitMT(4) gives the same: around 6.7 - 7 sec

I have run the test 3-5 times, and fluctuations seem to be ±0.1 sec

So, my conclusion so far is to use Filter and disable multithreading. Again, this puzzles me as the most optimal method.


I think the reason is that with Range, you read much less data and jump directly to the region of the dataset you need, avoiding decompressing and other overheads.

Happy the solution worked for you!


To ensure we are on the same page:

Using Range turned out to be slower by 50% than Filter…

It is acceptable for me to wait 4 or 6 seconds anyway.
But I was wondering if there any faster way.

Using Range only slows things down

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.