Merging trees with TTree::MergeTrees()

Dear experts,

I have a problem when merging trees from several root files. When using TTree::MergeTrees(list), the trees are merged correctly, containing the correct number of events.

But in addition to the correct trees, the old input tree is also written to the output file. Strangely with some events missing.

Here is a skimmed version of the code:

  TString fileListStr = fileList;
  TObjArray* listArr = fileListStr.Tokenize(" ");
  TList* treeList = new TList;
  for(Int_t i=0; i<listArr->GetEntries(); i++)
  {
    // Load input tree
    TString fileStr = (static_cast<TObjString*>(listArr->At(i)))->GetString();
    TFile* inputFile = new TFile(fileStr.Data(), "READ");
    TList* tmpList   = static_cast<TList*>(inputFile->FindObjectAny(listName));
    TTree* inTree    = dynamic_cast<TTree*>(tmpList->FindObject(inTreeName));

    // Add tree to list if max sample threshold is not reached
    treeList->Add(inTree);
    delete inputFile;
  }

  // Write merged tree to output file
  TFile* outFile = new TFile(outputFile,"UPDATE");
  TTree* outTree = TTree::MergeTrees(treeList);
  outTree->SetName(outTreeName);
  outTree->Write(0, TObject::kOverwrite);
  delete outFile;
  delete treeList;

I thought opening the output file after loading the trees would prevent them to be written to the file.

I already tried to add inTree->SetDirectory(0) for each tree, but no success.

What am I missing?

Thanks & best
Rudiger

Hi,

I think I found a (very slow) work-around. I don’t know exactly why it does not work out-of-the-box.
I am still interested if somebody knows the ROOT-native solution.

  // Write merged tree to output file
  TFile* outFile = new TFile(outputFile,"UPDATE");
  gROOT->cd(); // new
  TTree* outTree = TTree::MergeTrees(treeList);
  outFile->cd(); // new
  outTree->SetName(outTreeName);
  outTree->Write(0, TObject::kOverwrite);
  delete outFile;
  delete treeList;

From my naive perspective, it looks like TTree::MergeTrees() somehow writes the tree it is reading from into the current directory.

Besides being slow, I guess the current solution has the drawback that the merging is done in memory and not in a file. Maybe I could load a cache file instead of using gROOT->cd().

Best
Rudiger

As an aside:

    treeList->Add(inTree);
    delete inputFile;

this result in the TTree object to be deleted. [You can avoid this by call SetDirectory(0) but then the TTree no longer has access to it data … you can work-around that calling LoadBaskets but then you might run out of memory … )

This is the expected behavior of your code per se. However not that what is ‘duplicated’ is not the data but the much smaller meta-data (the TTree object itself). This is due to both the ROOT file support for ‘backups’ (or cycle) of objects and
to

  TTree* outTree = TTree::MergeTrees(treeList);
  outTree->SetName(outTreeName);

The first statement will periodically take snapshot of the TTree meta data (to allow for file recovery if the process crashes) and it has no choice but to store it using the old name. And then the 2nd statement store the meta-data one last time (with complete information) under the new name.

What you can do instead is:

    // Add tree to list if max sample threshold is not reached
    inTree->SetName(outTreeName);
    treeList->Add(inTree);

Cheers,
Philippe.

Hi Philippe,

many thanks for the explanation. Your suggestion worked such that the backup data is not visible anymore.
Concerning the deletion of the tree input files, I don’t understand why the code is running though the TTree object is deleted.

Thanks again
Rudiger

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.