Delete multiple branches from atootfile at once

sanjeeda · October 11, 2020, 2:32pm

__
Dear Experts,

I want to remove some (about 200) branches from a root file. I tried removing branchs in the following way and it works.

 TFile f("myfile.root","update");
   TTree *T = (TTree*)f.Get("treename");
   TBranch *b = T->GetBranch("name of branch to delete");
   T->GetListOfBranches()->Remove(b);
   T->Write();

However, as I need to remove a lot of branches, it doesn’t seem to be a good idea to be doing it this way.

Can you please help me?

Gratefully,
Sanjeeda
_
Please read tips for efficient and successful posting and posting code

ROOT Version: Not Provided
Platform: Not Provided
Compiler: Not Provided

eguiraud · October 12, 2020, 2:18pm

Hi @sanjeeda,
I think we need @pcanal’s help to suggest a more performant solution.

Cheers,
Enrico

pcanal · October 12, 2020, 3:46pm

What is your end goal? (Why do you want to remove the branches?)

sanjeeda · October 12, 2020, 3:50pm

Hi, @pcanal and @eguiraud. Thank you so much for your response. My fine size is currently 6.4 GB and contains many additional branches which I will not be needing for my analysis for sure. So, I want to remove some branches so that my file is smaller and thus the macros will run much faster.

pcanal · October 12, 2020, 4:34pm

If you select the branches at run-time (RDataFrame does that automatically for you) there will be no noticeable performance difference between the large and the small file. (i.e. selecting the branches will mean only those branches will be read from the file).

“Deleting” branches from the file will not shrink the file, at best it will mark the space previously occupied by the branches’s data as available for later use.

The way to reduce the file size is to clone the TTree after selecting the subset you want to keep into a new file:

   TFile oldfile(filename);
   TTree *oldtree;
   oldfile.GetObject("T", oldtree);

   // Deactivate all branches
   oldtree->SetBranchStatus("*", 0);

   // Activate only four of them
   for (auto activeBranchName : {"event", "fNtrack", "fNseg", "fH"})
      oldtree->SetBranchStatus(activeBranchName, 1);

   // Create a new file + a clone of old tree in new file
   TFile newfile("small.root", "recreate");
   auto newtree = oldtree->CloneTree(-1, "fast");

   newtree->Print();
   newfile.Write();

sanjeeda · October 12, 2020, 5:12pm

Thank you very much for the explanation and the code @pcanal. I have tried this and it is available on the website too. All I wanted to know was if there was a more efficient way to include or delete many branches at once, by not having to write their names manually.

I think I you are implying that, if there are lesser entries in the branches, then the macros will run faster irrespective of the number of branches. However, if the entries in the branches are large, then the macros will take time even if there is just one branch. I have experienced this, but realized it only after you pointed it out

Please let me know if I am understanding it correctly.

Gratefully,
Sanjeeda

pcanal · October 12, 2020, 5:30pm

I think I you are implying that, if there are lesser entries in the branches, then the macros will run faster irrespective of the number of branches.

That is (of course) true but not what I was pointing out.

For a well written macros, reading/using the data from a given set of branches will always always take the same time irrespective of how many more branches the TTree has. (Also the few branches you read, the faster it will be when comparing with the same number of entries).

sanjeeda · October 12, 2020, 5:51pm

Thank you