Unwanted objects in my output TFile when using large input TFiles (>1GB)

mahler · November 17, 2015, 11:42am

I’m currently doing an analysis involving some large TTrees. I’ve got a scheme to first copy the tree into a new tree containing only events which pass some cuts. Then I can use these copied trees to do furthur analysis.

During this analysis, I write a number of objects to the output file, but never the TTrees. However, when my input data file is particularly large (>=~1GB), I find TTree objects named “tree” making their way into my output files! Inspecting these trees reveals that they are indeed some fraction of the events I read in.

In addition, when a TTree get’s clandestinely copied to my output file, I’ll find a ‘TProcessID’ object sitting in my TFile as well. They’ll have names like “TProcessID0” or “TProcessID5” etc…

Here is some pseudocode to describe the process:

TFile* output_file = TFile::Open(filepath, "RECREATE");

//Make folder to store output
output_file->mkdir(folder_name);
output_file->cd(folder_name);

TChain* pchain = new TChain("tree", "tree");
pchain->Add(data_file);

TTree* ptree_cuta = pchain->CopyTree(cuta);
TTree* ptree_cutb = pchain->CopyTree(cutb);

//Stuff involving ptree_cuta and ptree_cutb which
//creates histograms canvases etc... for instance:
TCanvas* c = new TCanvas("var_canvas", "var_canvas");
c->cd();
ptree_cuta->Draw("var");
ptree_cutb->Draw("var", "same");
c->Write();
delete c;

//Delete the data trees. I do not want to save them to the output file!
delete ptree_cuta;
delete ptree_cutb;
delete pchain;

//Save and close
output_file->Save();
output_file->Close();

Now, I know I’m not accidentally causing this with my code, because when the input file is small enough (>~1GB), these ProcessID and TTree objects dont’ make their way into the output.

In case it’s helpful, I’m running this code on lxplus, with root 5.34.09.

So, does anybody know why this is occuring and how I can stop it?

Any suggestions are greatly appreciated.

Axel · November 18, 2015, 10:12am

Hi,

When creating a new TTree (through CloneTree in your case), their baskets will end up in the “current” TFile. If you don’t want that, call gROOT->cd() before CloneTree().

Ideally you shouldn’t create clones of the trees, but fill an TEntryList root.cern.ch/doc/master/classTEntryList.html and iterate over those. Even more ideally, you only read the event once, decide whether it passes some cuts or not, and then analyze it further. Then you go to the next event. This reduces the time spent in I/O: you read each TTree entry only once.

Cheers, Axel.

mahler · November 18, 2015, 12:14pm

Thanks for your reply.

Thanks very much for the TEntryList suggestion and the gRoot->cd() suggestion. I’ll look at it and see if they help me.

While I would very much like to look at each event only once, the problem however, is that I use these TTrees to create RooDataSets and RooKeysPdfs. I then use these Pdfs to perform some multi-pdf fitting using RooFit. My hope with copying the trees once is to prevent having to apply these event level cuts when the pdf value is being calculated during the Minuit fitting procedure which will happen many many times.

Do you know whether RooFit caches the Pdf values? If it does then I can stop worrying about it.

Axel · November 18, 2015, 1:56pm

Hi,

I don’t know - I’ve asked Lorenzo to answer your question.

Cheers, Axel.

mahler · November 18, 2015, 2:41pm

I have another couple of questions about TTrees and TEntryLists etc…

1.) In order to populate the TEntryLists in the most painless way, I would very much like to use the TCut / selection strings which are so nice with Draw or Scan or CopyTree etc… However I also need to apply several cuts to a Tree, and each of these functions seems to operate on the entire tree as a whole. After looking for a while, I haven’t found anything which seems to be able to do multiple cuts at once, so I was wondering whether there is a way to call ptree->GetEntry(I) and then something like ptree->ApplyCut(TCut or const char*) which would return a bool that represents whether the last event retrieved with ‘GetEntry’ passes the Cut. This way I would be able to iterate over all events and check each event individually.

I can of course do it by loading each branch into a local variable with SetBranchAddress, then test the values in the local variables. But this is rather cumbersome and doesn’t allow me to quickly test different cuts quickly. I’d much prefer the string version.

2.) Let’s say I have several TEntryLists containing events passing different cuts. If I do this:

ptree->SetEntryList(list_a);
TTree* ptree_a = ptree->CopyTree();
ptree->SetEntryList(list_b);
TTree* ptree_b = ptree->CopyTree();

RooDataSet dataset_a("A", "A", argset, RooFit::Import(*ptree_a));
RooDataSet dataset_b("B", "B", argset, RooFit::Import(*ptree_b));

Will dataset_a and dataset_b use the events from list_a and list_b respectively? Or does the fact that I changed the EntryList of the original TTree cause the EntryList in the CopyTrees to change as well?

moneta · November 18, 2015, 7:09pm

RooFIt in general is very good in caching the PDF values to avoid to recompute them when it is not needed.
However by doing this it might use quite a substantial amount of memory.
There are different level of optimisation which can be tried using RooMinimizer::optimizeConst(value).
value=0 is less caching and 1 or 2 is more. Default I think is 2.

Best Regards

Lorenzo