Performance issue when merging root files

Dear all,

I have to do the folowing.
I have a set of root files containing TProfiles. Each file contains the same objects, with different content. From those, I want to create a new file that contains again the same set of TProfiles, adding their content.

I have a code that does that nicely, taking into account the arbitrary directory structure that I might have, but the code is surprisingly slow. I am wondering if I am hitting a limitation from the data size(I have ~10 files with ~20000 profiles each, and 1000 bins per TProfile), or if I can save time being more clever.

I attach my code. Basically it does the following:

Loop over files { while (TKey *obj = (TKey*)next()) { TObject* o = obj->ReadObj(); if( strcmp(o->IsA()->GetName(),"TProfile")==0) { TProfile* profile= (TProfile*) o; output_->cd(); TProfile* p = (TProfile*)gDirectory->Get(profile->GetName()); p->Add(profile); input_->cd(); } } } output_->Write();

Many thanks,
Christophe.
mergeInfo.C (2.96 KB)

By timing the code with TStopwatch, I discovered that most of the time is spend in reading objects from the files. Namely at

obj is a TKey obtained via a TIter. I tried to add input_->ReadAll() call when opening the file, but it doesn’t help.

Is there any way to speed-up the readout ? Maybe another way of accessing objects ?

Thanks,
Christophe.

Christophe,

Looking at your numbers, I conclude that each of your 10 files must be at least 400 MBytes, ie you have to read about 4 GBytes of data.
Could you specify how long it takes on your system to do this?

Rene

I am presently testing the code with a set of 10 smaller files containing 5000 TProfiles each (files are ~15MB each).
On my 1.7GHz centrino, it takes 11 minutes to merge the files.

I looked at the system activity while running the code, and it seems that most of the activity is pure CPU, not the disk access (there is almost no system CPU activity while merging, in contrast to what happens when the output is saved at the end).

Christophe,

Could you post a piece of code reproducing your problem and links to your files.
I will look into it once back to Geneva tomorrow.

Rene

Many thanks!

I put the script and 4 input files in my afs public space:
/afs/cern.ch/user/d/delaer/public

If you want to test on more files, copying multiple times the same file should be equivalent.
My tests show that the time needed is the same for the successive files anyway. Memory is stable also: no memory leak.

I remind you the commands:

.L mergeInfo.C+
mergeInfo tool(“SiStripCommissioningSource_0010348”,11)
tool.merge()

Christophe,

I had a quick look through your files. I see that
-you have 5000 TProfiles distributed in about 3000 directories
with a typical depth of 10 !
-Your TProfiles have a very high compression factor (> 8)

Before continuing this investigation, I strongly suggest to
-reduce considerably the number of directories and the maximum depth
-use TProfile2D instead

Rene