I’m combining my skimmed monte carlo in root files. I have a tree named in “Event” in a bunch of files and I combine them like:
TChain* eventsMerged = new TChain("Event");
for (const auto& filename : inputFiles)
{eventsMerged->Add(filename.c_str());}
Long64_t operationCode = eventsMerged->Merge(outputFile.c_str());
so far, so good.
However, I’d like to preserve which file or tree these came from (even it just an int index that counts) so I can trace back events. I keep an event ID inside each event, so that’s fine, but these can be degenerate from different ‘runs’ in different skimmed files I’m merging. Is there a way to do this?
I imagined creating a friend TTree with 1x leaf that is something like int originalTTreeID.
E.g. (fictional) data would look like (for some skimmed data):
So, in the mean time, I looped over each of my files I’m merging and built up a vector of N entries in each. I then created a friend tree in the new output file and for n events in that file, Fill n times the index. A bit cludgy but worked. Is there a better way?
std::vector<unsigned long long int> nEventsPerTree;
for (const auto& filename : inputFiles)
{
TFile* f = new TFile(filename.c_str(), "READ");
TTree* eventTree = dynamic_cast<TTree*>(f->Get("Event"));
nEventsPerTree.push_back(eventTree->GetEntries());
f->Close(); delete f;
}
// in the output file... after merging... looks like
//TFile* output = new TFile("output.root", "UPDATE");
TTree* eventCombineInfoTree = new TTree("EventCombineInfo", "EventCombineInfo");
UInt_t originalID = 0;
eventCombineInfoTree->Branch("combinedFileIndex", &originalID);
for (int fileIndex = 0; fileIndex < (int)nEventsPerTree.size(); fileIndex++)
{
unsigned long long int v = nEventsPerTree[fileIndex];
for (unsigned long long int j = 0; j < v; j++)
{
originalID = (UInt_t)fileIndex;
eventCombineInfoTree->Fill();
}
}
TTree* eventTree = dynamic_cast<TTree*>(output->Get("Event"));
eventTree->AddFriend(eventCombineInfoTree);}
I skim my original monte carlo down to ~ 1% of it’s original size. I combine files to avoid having thousands of tiny files. But I still need to be able to retrace an event to a run and therefore seed value.
The skimming is used to prepare a handoff file to a detector simulation but the original monte carlo contains much more detailed information.
Usually when creating the original tree (that are eventually skimmed), there is enough information to uniquely identify the entries. Very often a “run number” is used in addition to the “event number”.
An alternative could be that at the time you are doing the skimming you add new column/branch and filled it with the unique information (i.e your file number).