Merge Trees -> Create Index of Original Tree as Friend

bleu65 · September 6, 2022, 12:17pm

Hello,

I’m combining my skimmed monte carlo in root files. I have a tree named in “Event” in a bunch of files and I combine them like:

TChain* eventsMerged = new TChain("Event");
for (const auto& filename : inputFiles)
  {eventsMerged->Add(filename.c_str());}
Long64_t operationCode = eventsMerged->Merge(outputFile.c_str());

so far, so good.

However, I’d like to preserve which file or tree these came from (even it just an int index that counts) so I can trace back events. I keep an event ID inside each event, so that’s fine, but these can be degenerate from different ‘runs’ in different skimmed files I’m merging. Is there a way to do this?

I imagined creating a friend TTree with 1x leaf that is something like int originalTTreeID.

E.g. (fictional) data would look like (for some skimmed data):

EventID     originalTTreeID
12              1
26              1
11              2
12              2
14              2
4               3
.... etc

Many thanks,
Laurie

etejedor · September 6, 2022, 2:23pm

Hello,

@pcanal can perhaps comment on how one would usually do this.

pcanal · September 6, 2022, 5:50pm

If you do need to keep the distinction of which TTree comes from which files, what is the benefit/advantage to merge the files in you case?

bleu65 · September 6, 2022, 5:55pm

So, in the mean time, I looped over each of my files I’m merging and built up a vector of N entries in each. I then created a friend tree in the new output file and for n events in that file, Fill n times the index. A bit cludgy but worked. Is there a better way?

std::vector<unsigned long long int> nEventsPerTree;
for (const auto& filename : inputFiles)
  {
    TFile* f = new TFile(filename.c_str(), "READ");
    TTree* eventTree = dynamic_cast<TTree*>(f->Get("Event"));
    nEventsPerTree.push_back(eventTree->GetEntries());
    f->Close(); delete f;
  }

// in the output file... after merging... looks like
//TFile* output = new TFile("output.root", "UPDATE");
TTree* eventCombineInfoTree = new TTree("EventCombineInfo", "EventCombineInfo");
UInt_t originalID = 0;
eventCombineInfoTree->Branch("combinedFileIndex", &originalID);
for (int fileIndex = 0; fileIndex < (int)nEventsPerTree.size(); fileIndex++)
  {
    unsigned long long int v = nEventsPerTree[fileIndex];
    for (unsigned long long int j = 0; j < v; j++)
      {
        originalID = (UInt_t)fileIndex;
        eventCombineInfoTree->Fill();
      }
  }
TTree* eventTree = dynamic_cast<TTree*>(output->Get("Event"));
eventTree->AddFriend(eventCombineInfoTree);}

bleu65 · September 6, 2022, 5:58pm

I skim my original monte carlo down to ~ 1% of it’s original size. I combine files to avoid having thousands of tiny files. But I still need to be able to retrace an event to a run and therefore seed value.

The skimming is used to prepare a handoff file to a detector simulation but the original monte carlo contains much more detailed information.

I guess there just isn’t such a feature?

pcanal · September 6, 2022, 6:11pm

Usually when creating the original tree (that are eventually skimmed), there is enough information to uniquely identify the entries. Very often a “run number” is used in addition to the “event number”.

An alternative could be that at the time you are doing the skimming you add new column/branch and filled it with the unique information (i.e your file number).

system · September 20, 2022, 6:12pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.