Merging a large amount of data using hadd or TChain::Merge

Hi,

We often merge ntuples, which do include some multidimensional arrays, using hadd. Recently we attempted merging 104 ntuple files, which are 11 GB all togetgher, using ROOT v5.16’s hadd. This failed with a segmentation fault. I’ve tried some modifications such as adjusting the compression level as an option to hadd, to force the “fast” option to be turned off. Then I went and created a stand alone ROOT application that uses TChain::Merge (without fast specified in the option string), and we do TTree::SetMaxTreeSize large so that all the data is saved to one ROOT file, and that run failed as well with the following backtrace from gdb:

#0 0xffffe410 in __kernel_vsyscall ()
#1 0x56fe37a5 in ldexpf () from /lib/tls/libc.so.6
#2 0x56fe5209 in sighold () from /lib/tls/libc.so.6
#3 0x56f661f7 in __cxa_vec_delete3 () from /usr/lib/libstdc++.so.5
#4 0x56f66244 in __cxa_vec_delete3 () from /usr/lib/libstdc++.so.5
#5 0x56f663b6 in __cxa_vec_delete3 () from /usr/lib/libstdc++.so.5
#6 0x56f66612 in __cxa_vec_delete3 () from /usr/lib/libstdc++.so.5
#7 0x56f666ff in __cxa_vec_delete3 () from /usr/lib/libstdc++.so.5
#8 0x556ad503 in TStorage::ReAllocChar ()
from /afs/slac/g/glast/ground/GLAST_EXT/rh9_gcc32/ROOT/v5.16.00-gl1/root/lib/libCore.so
#9 0x55677804 in TBuffer::Expand ()
from /afs/slac/g/glast/ground/GLAST_EXT/rh9_gcc32/ROOT/v5.16.00-gl1/root/lib/libCore.so
#10 0x560f9c33 in TBufferFile::WriteFastArray ()
from /afs/slac/g/glast/ground/GLAST_EXT/rh9_gcc32/ROOT/v5.16.00-gl1/root/lib/libRIO.so
#11 0x56165b34 in TStreamerInfo::WriteBufferAux<char**> ()
from /afs/slac/g/glast/ground/GLAST_EXT/rh9_gcc32/ROOT/v5.16.00-gl1/root/lib/libRIO.so
#12 0x560fc2a3 in TBufferFile::WriteClassBuffer ()
from /afs/slac/g/glast/ground/GLAST_EXT/rh9_gcc32/ROOT/v5.16.00-gl1/root/lib/libRIO.so
—Type to continue, or q to quit—
#13 0x563054c3 in TBranch::Streamer ()
from /afs/slac/g/glast/ground/GLAST_EXT/rh9_gcc32/ROOT/v5.16.00-gl1/root/lib/libTree.so
#14 0x55706e41 in TClass::Streamer ()
from /afs/slac/g/glast/ground/GLAST_EXT/rh9_gcc32/ROOT/v5.16.00-gl1/root/lib/libCore.so
#15 0x560fa54f in TBufferFile::WriteObject ()
from /afs/slac/g/glast/ground/GLAST_EXT/rh9_gcc32/ROOT/v5.16.00-gl1/root/lib/libRIO.so
#16 0x560fa658 in TBufferFile::WriteObjectAny ()
from /afs/slac/g/glast/ground/GLAST_EXT/rh9_gcc32/ROOT/v5.16.00-gl1/root/lib/libRIO.so
#17 0x556eb2fb in TObjArray::Streamer ()
from /afs/slac/g/glast/ground/GLAST_EXT/rh9_gcc32/ROOT/v5.16.00-gl1/root/lib/libCore.so
#18 0x55706e41 in TClass::Streamer ()
from /afs/slac/g/glast/ground/GLAST_EXT/rh9_gcc32/ROOT/v5.16.00-gl1/root/lib/libCore.so
#19 0x560f9ee5 in TBufferFile::WriteFastArray ()
from /afs/slac/g/glast/ground/GLAST_EXT/rh9_gcc32/ROOT/v5.16.00-gl1/root/lib/libRIO.so
#20 0x561642fd in TStreamerInfo::WriteBufferAux<char**> ()
from /afs/slac/g/glast/ground/GLAST_EXT/rh9_gcc32/ROOT/v5.16.00-gl1/root/lib/—Type to continue, or q to quit—
libRIO.so
#21 0x560fc2a3 in TBufferFile::WriteClassBuffer ()
from /afs/slac/g/glast/ground/GLAST_EXT/rh9_gcc32/ROOT/v5.16.00-gl1/root/lib/libRIO.so
#22 0x56348f93 in TTree::Streamer ()
from /afs/slac/g/glast/ground/GLAST_EXT/rh9_gcc32/ROOT/v5.16.00-gl1/root/lib/libTree.so
#23 0x561218ec in TKey::TKey ()
from /afs/slac/g/glast/ground/GLAST_EXT/rh9_gcc32/ROOT/v5.16.00-gl1/root/lib/libRIO.so
#24 0x5610d365 in TFile::CreateKey ()
from /afs/slac/g/glast/ground/GLAST_EXT/rh9_gcc32/ROOT/v5.16.00-gl1/root/lib/libRIO.so
#25 0x56104909 in TDirectoryFile::WriteTObject ()
from /afs/slac/g/glast/ground/GLAST_EXT/rh9_gcc32/ROOT/v5.16.00-gl1/root/lib/libRIO.so
#26 0x556909a9 in TObject::Write ()
from /afs/slac/g/glast/ground/GLAST_EXT/rh9_gcc32/ROOT/v5.16.00-gl1/root/lib/libCore.so
#27 0x55690a1b in TObject::Write ()
from /afs/slac/g/glast/ground/GLAST_EXT/rh9_gcc32/ROOT/v5.16.00-gl1/root/lib/libCore.so
#28 0x5633e0a7 in TTree::AutoSave ()
—Type to continue, or q to quit—
from /afs/slac/g/glast/ground/GLAST_EXT/rh9_gcc32/ROOT/v5.16.00-gl1/root/lib/libTree.so
#29 0x56342399 in TTree::Fill ()
from /afs/slac/g/glast/ground/GLAST_EXT/rh9_gcc32/ROOT/v5.16.00-gl1/root/lib/libTree.so
#30 0x5631fd2b in TChain::Merge ()
from /afs/slac/g/glast/ground/GLAST_EXT/rh9_gcc32/ROOT/v5.16.00-gl1/root/lib/libTree.so
#31 0x5631f787 in TChain::Merge ()
from /afs/slac/g/glast/ground/GLAST_EXT/rh9_gcc32/ROOT/v5.16.00-gl1/root/lib/libTree.so

So I see things are going awry in one of the AutoSaves when trying to resize an array of chars. The memory usage and swap space use is on the high side:
Max Memory : 2127 MB
Max Swap : 2712 MB

I’m wondering if there are any suggestions to work around this - perhaps using SetAutoSave to force writing more often? I’ll go back and try the TChain::Merge with the fast option back on. Any other ideas?

Thanks,
Heather

Heather,

It would us time and energy (::slight_smile: if you could try with version 5.17/06

Rene

Hi Rene,

I downloaded the v5.17/06 binaries for rhel4-gcc3.4 and tried out hadd with the same set of files. In this case, I just used hadd with the default “fast” option and passed in all 104 files to the program. This job also failed - it got through over 10 GB of the data, but terminated in a similar fashion as before, and seemingly hitting a limit on memory usage:
Exited with exit code 134.

Resource usage summary:

CPU time   :    321.59 sec.
Max Memory :      2247 MB
Max Swap   :      2949 MB

We have tried merging the files in two sets and then merging the two files into one, that also fails. Any suggestions?

Thanks,
Heather

Heather,

Are you merging files containing memory-resident Trees? That is the only explanation that
I can find to your problem.
Could you send me the result of TTree::Print on one of your Trees (or for all your Trees inside one file in case you have more than one?

Rene

Hi Rene,

Please find attached the dump from TTree::Print from all 104 TTrees. I do not believe they are memory resident - looking back at the code, TFile was indeed called before the TTrees were created. Though, I’d be quite happy with an easy fix such as that :slight_smile: As you will see, all the trees contain 100000 events, except the last one.

Take care,
Heather
ttreePrint.txt (283 KB)

Heather,

Could you post one of your files somewhere in a public read area?

Rene

Hi Rene,

Thanks for taking a look. I posted a file here:

/afs/slac.stanford.edu/g/glast/ground/glastsoft/temp/r0216952972_e00000000000000055099_cal.root

Let me know if you have any trouble accessing it, I believe that area is readable from CERN via AFS.

Take care,
Heather

Heather,

Could you put this file in a public reable directory? I cannot access your file.

Rene

Hi Rene,

Apparently, I’m not that bright :slight_smile:
I put the file on the GLAST SLAC FTP space instead:

ftp://ftp-glast.slac.stanford.edu/glast.u33/heather

Hope that works this time!

Take care,
Heather

Hi Heather,

I have investigated your problem when merging your files.
Your file contains a Tree (110 Mbytes with 7 branches). Your branches have 100000 baskets of 32 Kbytes each of uncompressed data, but with an average compression factor of 119 !!

[code]*Br 3 :CalXtalAdcPedAllRange[16][8][12][2][4] : *

  •     | CalXtalAdcPedAllRange[16][8][12][2][4]/F                         *
    

*Entries :100000 : Total Size= 4928500635 bytes File Size = 37158789 *
*Baskets : 100000 : Basket Size= 32000 bytes Compression= 132.57 *
[/code]
ie you are streaming arrays with 0s. The effective not compressed data in your Tree
is 13.5 Gbytes ! If this tree is typical for your data (and I do not believe this), you should increase your basket size from 32000 to say 1000000.
Just in case you are not aware, we have introduced recently a new class THnSparse for an efficient storage of sparse multidim data (both memory and file). In case of 5-dim data, the gain is typically 1000 compared to the direct 5-d array.

Rene

Hi Rene,

Yes, that is a particularly sparse set of data, though those multi-dimensional arrays are typically rather sparse. There is an ongoing discussion concerning the apparent desire of some of users for fixed length arrays, versus the use of variable length arrays. We will definitely look at the new ROOT objects for sparse arrays.

One more question, even if all of our data is not quite so sparse, what is the drawback of just increasing our basket size to something like 1000000 ? We do invoke TTree::AutoSave more frequently than the default to limit data loss in the case of a job crash. The increased basket size reduces our ntuple file sizes and takes care of these intermittent hadd issues with some of our ntuples - it seems that a larger basket size would make sense for us in this case. Surely there is a cost we are over-looking.

Take care,
Heather

-First point: Multi-dim arrays. Execute $ROOTSYS/tutorials/hist/sparsehist.C and you will see that for dimensions >= 3 ,THnSparse beats fix size arrays by large factors (time and memory).

-Second point: buffersize. Increasing the branch buffer size has 3 drawbacks
-it takes more space in memory
-if your job crashes you loose more data
-calling AutoSave too frequently will slow down your program because there is more data to stream with the Tree header at each call to AutoSave.

Rene