How to open two or more trees using ROOT::RDF::Experimental::FromSpec?


_ROOT Version: 6.30/04
_Platform: AlmaLinux 9.2 x86_64
_Compiler: (not sure, using conda version)


Hello,
I am using the ROOT::RDF::Experimental::FromSpec method to read multiple samples from a JSON file. It’s useful because I am defining different luminosities and cross sections per sample. All my files have the same TTree structure. I would like to load two trees at once (“Events”, “Runs”). But the example provided in the documentation really seems to mean that you can load N different trees from N different files, which is different from my purpose. Is there a way to open two or more trees of the same name from multiple files using this method?

I guess @vpadulan can help you.

Hi @kyoon,

it is just an example - you can use the same treenames for different filenames for your different samples.

Cheers,
Marta

Hi @mczurylo,

If I do that, for example, provide two entries in the list of “trees” in the json file, I get the following error.

cppyy.gbl.std.logic_error: ROOT::RDataFrame ROOT::RDF::Experimental::FromSpec(const string& jsonFile) =>
logic_error: Mismatch between number of trees and file globs.

Hi @kyoon,

Could you share what does your JSON file look like?

Cheers,
Marta

It’s a very long file due to the number of root files, so I will instead share the basic structure.

“samples”: {
“name_of_sample”: {
“trees”: [
“Events”, “Runs”
],
“files”: [
“file1.root”,
“file2.root”,
“file3.root”,

],
“metadata”: {
“xsec”: 1.234,
“lumi”: 1.234,
“sample_category”: “MC_BKG”
}
}
}
}

Hi @kyoon,

at this stage such a processing is indeed not allowed. If you had just one tree that is the same in the three files, this would work. The problem here is the concatenation of the two trees “Events” and “Runs” horizontally. To potentially help you more, I would need to know what exactly are the “Events” and “Runs” trees, are they aligned - as in would they have the same rows? (I doubt this is the case but maybe I am wrong).

Cheers,
Marta

Hi @mczurylo,

The “Events” tree contains all the physics information as the name indicates, while the “Runs” tree contains summary information like the sum of MC weights. Indeed, they have different rows.

Hi @kyoon,

thanks for your reply. In this case the situation will be a bit harder - yet not impossible, you can follow this other forum post for more details how to proceed with the TTreeIndex and creating a combined dataset: Modification of a RDataFrame with an extracted column - #3 by zazbone

Cheers,
Marta