I’m using an ensemble of scripts/functions to perform an analysis using RDataFrame.
The RDataFrame is always reading data from TTrees saved in lists of root files.
At various points in the code I’d need to know how many events the code is going to loop through, basically tree->GetEntries() .
I didn’t find a way to get this info from the RDataFrame instance itself (or whatever object I get after series of Define() or Filter() calls): is it possible ? It would avoid the need to pass the info in addition to the RDF in various layers of code…
More generally, is it possible to get back to the source TTree/TChain from a given RDF object ?
when you create an RDataFrame, and you don’t run the event loop, it doesn’t open any files to be nice to distributed filesystems. So you don’t know how many events are in the files.
However, when the event loop is running, you can get information about the current sample that is processed by each thread.
You can for example define quantities like event weights and the like per sample, see DefinePerSample. The example with weights would look like this: