Accessing the source TChain/TTree of a RDataFrame

pad · April 29, 2025, 12:24pm

Dear all,

I’m using an ensemble of scripts/functions to perform an analysis using RDataFrame.
The RDataFrame is always reading data from TTrees saved in lists of root files.
At various points in the code I’d need to know how many events the code is going to loop through, basically tree->GetEntries() .
I didn’t find a way to get this info from the RDataFrame instance itself (or whatever object I get after series of Define() or Filter() calls): is it possible ? It would avoid the need to pass the info in addition to the RDF in various layers of code…
More generally, is it possible to get back to the source TTree/TChain from a given RDF object ?

Thanks !

ROOT Version:6.34.08

StephanH · April 29, 2025, 2:49pm

Hello @pad,

when you create an RDataFrame, and you don’t run the event loop, it doesn’t open any files to be nice to distributed filesystems. So you don’t know how many events are in the files.

However, when the event loop is running, you can get information about the current sample that is processed by each thread.
You can for example define quantities like event weights and the like per sample, see DefinePerSample. The example with weights would look like this:

ROOT::RDataFrame df{"mytree", {"sample1.root","sample2.root"}};
df.DefinePerSample("weightbysample",
                   [](unsigned int slot, const ROOT::RDF::RSampleInfo &id)
                   { return id.Contains("sample1") ? 1.0f : 2.0f; });

You see that the lambda has access to a SampleInfo object, from which you can get sample name and number of events in the sample.

system · May 13, 2025, 2:49pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.