What is the Easiest Way to Merge a Large Number of Small ROOT files each containing multiple TTrees?


Please fill also the fields below. Note that root -b -q will tell you this info, and starting from 6.28/06 upwards, you can call .forum bug from the ROOT prompt to pre-populate a topic.

ROOT Version: 6.26/06
Platform: Scientific Linux
Compiler: None (macro)


Hi ROOT forum,

I’ve read several threads on this topic, but none of those I found quite fits my use case - I need to merge a large number of ROOT files each containing multiple TTrees. The TTree names are consistent between the files. The resultant file will be > 100 GB.

Currently, I do the following:

int merge_root(int nfiles, string tag, int datormc, int cor)
{
  TChain ch("ttree");
  string filename;
  string base = "output/evt/events_"+tag+(tag!=""?"_":"");
  string dmc = (!datormc?"data_":"mc_");
  string cuc = (cor?"cor_":"unc_");
  base += dmc;
  base += cuc;
  string ext = ".root";
  for(int i=0; i<nfiles; i++)
    {
      if(i%10==0) cout << i << endl;
      filename = base;
      filename += to_string(i);
      filename += ext;
      cout << filename << endl;
      const char *alsofile = filename.c_str();
      const char* othertree = (filename+"?#outt").c_str();
      try
        {
          ch.Add(alsofile);
          ch.Add(othertree);
	}
      catch(...)
	{
          continue;
	}
    }
  filename = "results/merged_" +tag+(tag!=""?"_":"") + dmc + cuc + to_string(nfiles) + ext;
  ch.Merge(filename.c_str());
  return 0;
}

Note that the first tree is named “ttree”, while the second is named “outt”. The result I get contains ONLY the merged “ttree”, with no “outt” in sight.

Curiously, the error messages I receive say that the second call to TChain::Add() does not find the file/tree it is looking for, and contain some “characters” which aren’t recognized by the system I am on - i.e., they show up as rhombi with question marks in them.

Any ideas?

Thanks!

Hi @jocl,
the problem is that the object filename+"?#outt" is destroyed after the declaration of othertree, thus the memory where it pointing is corrupted. This is why you see those rhombi with question marks when you call the Merge method. In order to solve your issue you should just declare a string variable to assign to filename+"?#outt", as done below.

int merge_root(int nfiles, string tag, int datormc, int cor)
{
  TChain ch("ttree");
  string filename, filenameoutt;
  string base = "output/evt/events_"+tag+(tag!=""?"_":"");
  string dmc = (!datormc?"data_":"mc_");
  string cuc = (cor?"cor_":"unc_");
  base += dmc;
  base += cuc;
  string ext = ".root";
  for(int i=0; i<nfiles; i++)
    {
      if(i%10==0) cout << i << endl;
      filename = base;
      filename += to_string(i);
      filename += ext;
      cout << filename << endl;
      const char * alsofile = filename.c_str();
      filenameoutt = filename+"?#outt";
      cout << filenameoutt << endl;
      const char * othertree = filenameoutt.c_str();
      try
        {
          ch.Add(alsofile);
          ch.Add(othertree);
        }
      catch(...)
        {
          continue;
        }
    }
  filename = "results/merged_" +tag+(tag!=""?"_":"") + dmc + cuc + to_string(nfiles) + ext;
  cout << filename << endl;
  ch.Merge(filename.c_str());
  return 0;
}

Cheers,
Monica

This does in fact resolve the error messages (thank you), but there is a new issue: the output file contains only one tree, which is always the first one added to the chain.

My code now looks like:

int merge_root(int nfiles, string tag, int datormc, int cor)
{
  TChain ch("tchain");
  string filename, filenameoutt, filenamettree;
  string base = "output/evt/events_"+tag+(tag!=""?"_":"");
  string dmc = (!datormc?"data_":"mc_");
  string cuc = (cor?"cor_":"unc_");
  base += dmc;
  base += cuc;
  string ext = ".root";
  for(int i=0; i<nfiles; i++)
    {
      if(i%10==0) cout << i << endl;
      filename = base;
      filename += to_string(i);
      filename += ext;
      filenamettree = filename+"?#ttree";
      cout << filenamettree << endl;
      const char *alsofile = filenamettree.c_str();
      filenameoutt = filename+"?#outt";
      cout << filenameoutt << endl;
      const char* othertree = filenameoutt.c_str();
      try
        {
          ch.Add(alsofile);
          ch.Add(othertree);
        }
      catch(...)
        {
          continue;
        }
    }
  filename = "results/merged_dEdeta_" +tag+(tag!=""?"_":"") + dmc + cuc + to_string(nfiles) + ext;
  ch.Merge(filename.c_str());
  return 0;
}

If I run it in this form, I see an output file with one tree named “tchain” containing the values from what is called “ttree” in the un-merged files. The tree/values associated to “outt” is/are nonexistent.

On the other hand, if I comment out the line “ch.Add(alsofile)”, I still see a tree named “tchain”, but it contains the values from “outt”.

Actually, for my purposes it might be more useful to not merge the files at all and put them in a TChain in the file in which I do my processing, but I would still like to know if what I’m trying to do is possible…

On the other hand,

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Did you try the hadd command line with the work-around described in Root 6.04.14 hadd 100Gb and rootlogon - #2 by pcanal ?