Very slow output writing multiple directories and histograms

I am working with a compiled ROOT executable which reads in one file, with several directories each containing nine TTrees (with 20-50 events each), and produces an output file containing a couple of layers of directory each with several dozen histograms.

If we simply open the two files and start processing, the executable explodes in memory, crashing the host system within a few minutes. To work around this, I’ve tried to introduce flushing and deleting memory on the output and
input files:

 void EcsModeAnalyzer::flushOutput() {
    //***  _outfile->Write();
   _outfile->Flush();
   _outfile->Delete("T*");
   _source->Delete("T*");
 }  

This function is called for each sub-subdirectory (corresponding to one input tree). The two variables _outfile and _source are just TFile* pointers, opened appropriately in my main(). As you see, I’ve tried both TFile::Write() and TFile::Flush(), with no difference in behaviour.

The problem is speed. It is taking anywhere from two minutes (writing to a local disk) to as long as ten or fifteen minutes, for this function to complete on each directory! It astounds me that the I/O can possibly be this slow, and I’m sure I am just doing something wrong. Do I need some “cd()” calls in there? Do I need to specify options for Write or Flush? Or is the Delete() not appropriate for releasing memory?

I wonder if anyone could give me some guidance on how I should be opening and configuring the input and output files. I’ve got things set up so that the input directories and trees are iterated through in sequence, to try and minimize random file access as much as possible.

                                  Regards,

                                   Michael Kelsey

Hi,

This is indeed strange. Could you send an example so that we could reproduce this behavior (and/or understand better the issue).

As a (semi random) advice, you may want to use the directory pointer rather than the file pointer in your flush function.

Cheers,
Philippe.

I’m working on putting together a simpler demonstration than our full analysis package :-/ I did discover (this morning) that calling TFile::Flush() instead of TFile::Write() made a noticeable difference in the time. I’m running a job now to ensure that the actual results are consistent with each other.

Thank you very much for this! I’ll make that change and see how the performance stacks up.

                                                          -- Michael Kelsey