Slow down reading many TFiles

Dear ROOT experts,

I’m having some issues with code slowing down when dealing with a large number of TFiles (~26,000). The code starts off quickly, but starts to slow down dramatically (by a factor of 5-6) after a few thousand files. The starting point of the slowdown is not consistent from run to run.

The first thing I would like to do is step through all root files in a folder, and add their filenames to a vector if they contain a TTree I’m looking for. I would like to process each file individually to get some quantities from them, and then plot the quantities for each file, so I am not using a TChain.

The function I am using to do this is shown below, where “dataPath” contains the ~26,000 files.


vector<TString> loadFilesInFolderWithTree(TString dataPath,TString treeName) {
  DIR *dp;
  struct dirent *dirp;
  
  //Store file names in here if contain tree
  vector<TString> rootFiles = {};

  //We'll check the files in question have the correct ending
  string fileEnding = string(".root");

  //Check that we can open the folder
  if (dp = opendir(dataPath.Data()) == NULL) {
    cout << "Error(" << errno << ") opening " << dataPath << endl;
    exit(-1);
  }
  
  //For demonstrating slow down
  time_t start, current;
  time(&start);

  //List all files in folder, check if they have the specified ending
  while ( (dirp = readdir(dp)) != NULL ) {
    if ( string(dirp->d_name).find(fileEnding) != string::npos ) {
      //Check this file has the proper TTree
      TString filename=dirp->d_name;
      TFile* runFile=TFile::Open(dataPath+filename,"READ");
      if (runFile->GetListOfKeys()->Contains(treeName)) {
        //Add to rootFiles vector if it contains the tree name
        rootFiles.push_back(dirp->d_name);

        //Print how long it is taking to add 200 files
        if (rootFiles.size()%200==0) {
          time(&current);
          cout<<"Found "<<rootFiles.size()<<" valid files in "<<double(current-start)<<" seconds"<<endl;
          time(&start);
        }
      }
      delete runFile;
    }
  }
  
  //Close the DIR object
  closedir(dp);
  
  return rootFiles;
}

ROOT Version: 6.20/04
Platform: Ubuntu
Compiler: gcc 4.8.5

Hi @shedges ,
thank you for the report. Could you please check where time is spent by compiling your program with debug symbols (-g compiler option) and running it with perf --call-graph dwarf -F99 record ./program && perf report? Another simple check is that nothing in your application is leaking objects into ROOT’s global lists – you can check that with valgrind --track-origins=yes --suppressions=$ROOTSYS/etc/valgrind-root.supp ./program.

This works best if ROOT has also been compiled with debug symbols (such builds are e.g. available from CVMFS and are easily accessible if you can reproduce the problem on lxplus).

@pcanal might be able to make an educated guess without seeing performance profiles – I cannot, because as far as I can tell you are cleaning after yourself.

Cheers,
Enrico

Thanks for the help! Here’s running the command and the top lines of the perf report command. I haven’t used perf before and am working on understanding the output, but let me know if anything stands out.

I guess the other thing worth noting is that the files in the folder I’m looking at take up ~100GB of space. I tried testing on 10,000 files taking up ~100MB of space and had no issues there.

perf seems to suggest that 61% of your runtime is spent in (a vectorized implementation of) strlen. That does not make a lot of sense to me. You can try increasing the sampling frequency to e.g. -F999 to see if the profile changes.

Does valgrind complain about anything?

Strange indeed … I will have to try to reproduce this problem.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.