Merging ROOT Files with an applied cut

Hello,

I am wondering if there is any way to merge my ROOT files, whilst applying a cut to them so that the files can be slimmed down in size as the combined file size of the merged files is ~70 GB. All my ROOT files contain a single tree of variables on which I wish to apply a cut on a few of the stored variables. Usually, I would just use TChain to merge, but I’m not entirely sure about the process.

Thanks,
Andre

ROOT Version: 6.24/06
Platform: x86_64-redhat-linux
Compiler: gnu 4.8.5

Hi Andre,

I don’t think this is possible out of the box with the hadd utility. However, you can do that easily with RDataFrame and its Snapshot feature.
This tutorial does exactly that: it defines a ROOT data frame starting from a file and a tree (you can clearly have as many files as you want), applies a cut on the dataset, defines new columns (you can skip this part if you don’t need it) and “snapshots” (writes on disk) the content of the filtered/augmented data frame.

I hope this helps.

Cheers,
D

This has been helpful, but I have tried to run this and have run into an issue where I get this error where there is something wrong with the branch name.

terminate called after throwing an instance of 'std::runtime_error'
  what():  GetBranchNames: error in opening the tree variables

My ROOT files are in the form of an ntuple with a single tree containing all my variables as leafs.

The code snippet:

int MC_mergesel() {
	auto sig_lepmc = "lep/*.root";
	auto treeName = "variables";

	auto sig_lepout = "siglep_merge.root";

	ROOT::RDataFrame sig_lep(treeName, sig_lepmc);


	sig_lep.Filter("Ranking == 1 && p > 1");

	sig_lep.Snapshot(treeName, sig_lepout);

	return 0;

}

Is there something I’m missing?

Thanks.

Hello Andre,

The only thing I see is that one should chain the operations, i.e. Filter does not apply a property to the node it’s applied on, but rather returns a new node:

        auto sig_lepmc = "lep/*.root";
	auto treeName = "variables";
	auto sig_lepout = "siglep_merge.root";
	ROOT::RDataFrame sig_lep(treeName, sig_lepmc);
	sig_lep.Filter("Ranking == 1 && p > 1").Snapshot(treeName, sig_lepout);

Assuming the variables names are correct, that should work. If not, could you share the file with us?

Best,
D

Thanks, this has helped. I also realised there was a path issue for my input files as well which may have caused the problem.

Thanks,

Andre

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.