I have a problem when merging trees from several root files. When using TTree::MergeTrees(list), the trees are merged correctly, containing the correct number of events.
But in addition to the correct trees, the old input tree is also written to the output file. Strangely with some events missing.
Here is a skimmed version of the code:
TString fileListStr = fileList;
TObjArray* listArr = fileListStr.Tokenize(" ");
TList* treeList = new TList;
for(Int_t i=0; i<listArr->GetEntries(); i++)
{
// Load input tree
TString fileStr = (static_cast<TObjString*>(listArr->At(i)))->GetString();
TFile* inputFile = new TFile(fileStr.Data(), "READ");
TList* tmpList = static_cast<TList*>(inputFile->FindObjectAny(listName));
TTree* inTree = dynamic_cast<TTree*>(tmpList->FindObject(inTreeName));
// Add tree to list if max sample threshold is not reached
treeList->Add(inTree);
delete inputFile;
}
// Write merged tree to output file
TFile* outFile = new TFile(outputFile,"UPDATE");
TTree* outTree = TTree::MergeTrees(treeList);
outTree->SetName(outTreeName);
outTree->Write(0, TObject::kOverwrite);
delete outFile;
delete treeList;
I thought opening the output file after loading the trees would prevent them to be written to the file.
I already tried to add inTree->SetDirectory(0) for each tree, but no success.
I think I found a (very slow) work-around. I don’t know exactly why it does not work out-of-the-box.
I am still interested if somebody knows the ROOT-native solution.
// Write merged tree to output file
TFile* outFile = new TFile(outputFile,"UPDATE");
gROOT->cd(); // new
TTree* outTree = TTree::MergeTrees(treeList);
outFile->cd(); // new
outTree->SetName(outTreeName);
outTree->Write(0, TObject::kOverwrite);
delete outFile;
delete treeList;
From my naive perspective, it looks like TTree::MergeTrees() somehow writes the tree it is reading from into the current directory.
Besides being slow, I guess the current solution has the drawback that the merging is done in memory and not in a file. Maybe I could load a cache file instead of using gROOT->cd().
this result in the TTree object to be deleted. [You can avoid this by call SetDirectory(0) but then the TTree no longer has access to it data … you can work-around that calling LoadBaskets but then you might run out of memory … )
This is the expected behavior of your code per se. However not that what is ‘duplicated’ is not the data but the much smaller meta-data (the TTree object itself). This is due to both the ROOT file support for ‘backups’ (or cycle) of objects and
to
The first statement will periodically take snapshot of the TTree meta data (to allow for file recovery if the process crashes) and it has no choice but to store it using the old name. And then the 2nd statement store the meta-data one last time (with complete information) under the new name.
What you can do instead is:
// Add tree to list if max sample threshold is not reached
inTree->SetName(outTreeName);
treeList->Add(inTree);
many thanks for the explanation. Your suggestion worked such that the backup data is not visible anymore.
Concerning the deletion of the tree input files, I don’t understand why the code is running though the TTree object is deleted.