I have to do the folowing.
I have a set of root files containing TProfiles. Each file contains the same objects, with different content. From those, I want to create a new file that contains again the same set of TProfiles, adding their content.
I have a code that does that nicely, taking into account the arbitrary directory structure that I might have, but the code is surprisingly slow. I am wondering if I am hitting a limitation from the data size(I have ~10 files with ~20000 profiles each, and 1000 bins per TProfile), or if I can save time being more clever.
I attach my code. Basically it does the following:
Loop over files {
while (TKey *obj = (TKey*)next()) {
TObject* o = obj->ReadObj();
if( strcmp(o->IsA()->GetName(),"TProfile")==0) {
TProfile* profile= (TProfile*) o;
output_->cd();
TProfile* p = (TProfile*)gDirectory->Get(profile->GetName());
p->Add(profile);
input_->cd();
}
}
}
output_->Write();
Looking at your numbers, I conclude that each of your 10 files must be at least 400 MBytes, ie you have to read about 4 GBytes of data.
Could you specify how long it takes on your system to do this?
I am presently testing the code with a set of 10 smaller files containing 5000 TProfiles each (files are ~15MB each).
On my 1.7GHz centrino, it takes 11 minutes to merge the files.
I looked at the system activity while running the code, and it seems that most of the activity is pure CPU, not the disk access (there is almost no system CPU activity while merging, in contrast to what happens when the output is saved at the end).
I put the script and 4 input files in my afs public space:
/afs/cern.ch/user/d/delaer/public
If you want to test on more files, copying multiple times the same file should be equivalent.
My tests show that the time needed is the same for the successive files anyway. Memory is stable also: no memory leak.
I had a quick look through your files. I see that
-you have 5000 TProfiles distributed in about 3000 directories
with a typical depth of 10 !
-Your TProfiles have a very high compression factor (>
Before continuing this investigation, I strongly suggest to
-reduce considerably the number of directories and the maximum depth
-use TProfile2D instead