Hi, @pcanal and @eguiraud. Thank you so much for your response. My fine size is currently 6.4 GB and contains many additional branches which I will not be needing for my analysis for sure. So, I want to remove some branches so that my file is smaller and thus the macros will run much faster.
If you select the branches at run-time (RDataFrame does that automatically for you) there will be no noticeable performance difference between the large and the small file. (i.e. selecting the branches will mean only those branches will be read from the file).
“Deleting” branches from the file will not shrink the file, at best it will mark the space previously occupied by the branches’s data as available for later use.
The way to reduce the file size is to clone the TTree after selecting the subset you want to keep into a new file:
TFile oldfile(filename);
TTree *oldtree;
oldfile.GetObject("T", oldtree);
// Deactivate all branches
oldtree->SetBranchStatus("*", 0);
// Activate only four of them
for (auto activeBranchName : {"event", "fNtrack", "fNseg", "fH"})
oldtree->SetBranchStatus(activeBranchName, 1);
// Create a new file + a clone of old tree in new file
TFile newfile("small.root", "recreate");
auto newtree = oldtree->CloneTree(-1, "fast");
newtree->Print();
newfile.Write();
Thank you very much for the explanation and the code @pcanal. I have tried this and it is available on the website too. All I wanted to know was if there was a more efficient way to include or delete many branches at once, by not having to write their names manually.
I think I you are implying that, if there are lesser entries in the branches, then the macros will run faster irrespective of the number of branches. However, if the entries in the branches are large, then the macros will take time even if there is just one branch. I have experienced this, but realized it only after you pointed it out
Please let me know if I am understanding it correctly.
I think I you are implying that, if there are lesser entries in the branches, then the macros will run faster irrespective of the number of branches.
That is (of course) true but not what I was pointing out.
For a well written macros, reading/using the data from a given set of branches will always always take the same time irrespective of how many more branches the TTree has. (Also the few branches you read, the faster it will be when comparing with the same number of entries).