TChain.merge

Hi,
I have 10K root files which each are about 5M in size and have 1 tree. I’d like to merge the files so that I have about 100 500M files (I’m hoping this would speed up processing). I written a small script which works for very small root trees, but crashes on TChain.merge for the complex tree’s I’d like to merge.

Does anyone have a suggestion for a better way to merge trees?

[code]def combineFiles( fileList, outputFilePath, outputFolder, treeName):

counter = 1
previousBreak = 1

chain = ROOT.TChain( treeName )

#open text file to hold list of newly created root files
newFileList = open(COMBINED_FILE_LIST, “w”)
for file in fileList:

  file = file.strip()
  chain.AddFile( file )

  if counter % 10 == 0 or file == fileList[-1]:

     #create new root file for the merged trees
     outputFilePath = "%s/%s_files%s-%s.root" % (outputFolder, treeName, previousBreak, counter)
     fileObj = ROOT.TFile(outputFilePath, "recreate")
     chain.Merge(fileObj,0, 'fast') #file automatically closed!!
     del chain
     chain = ROOT.TChain( "%s_%d" % (treeName, counter) )
     
     
     newFileList.write("%s\n" % outputFilePath )
     previousBreak = counter
     
  counter += 1

newFileList.close()

#---- main processing --------
fileList = open(SOURCE_LIST_FILE,“r”).readlines()
combineFiles(fileList, COMBINED_FILE_LIST, COMBINED_FILE_FOLDER, TREE_NAME)[/code]

I strongly suggest to use the standard merger $ROOTSYS/bin hadd instead of your complex script. Simply do

hadd -f result.root f1.root f2.root fN.root
Rene

Hi Rene,

Thanks for the suggestion. I tried hadd and the terminal output looks quite similar to when I ran my script. Unfortunately, it also ends with a segmentation fault.

Does this indicate that something is wrong with the trees?

Thanks.

[quote]Does this indicate that something is wrong with the trees? [/quote]Probably but not definitively … which version of ROOT are you using? What is the stack trace? What does valgrind complains about when you run it on the example?

Cheers,
Philippe.

Today, I am seeing St9bad_alloc for root versions v4_04_02b and v5_16_00. I’m not sure why hadd it using GBs of memory. The trees are complex but it was only trying to combine two 5MB files.

Valgrind does look too bad

==5828== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 35 from 2) ==5828== malloc/free: in use at exit: 2,082,439,205 bytes in 247,924 blocks. ==5828== malloc/free: 743,243 allocs, 495,318 frees, 2,171,431,776 bytes allocated. ==5828== For counts of detected errors, rerun with: -v ==5828== searching for pointers to 247,924 not-freed blocks. ==5828== checked 1,967,383,876 bytes. ==5828== ==5828== LEAK SUMMARY: ==5828== definitely lost: 60,499 bytes in 1,375 blocks. ==5828== possibly lost: 2,806,651 bytes in 90,119 blocks. ==5828== still reachable: 2,079,572,055 bytes in 156,430 blocks. ==5828== suppressed: 0 bytes in 0 blocks. ==5828== Rerun with --leak-check=full to see details of leaked memory.

Hi,

A bit strange indeed. Can you make available the files (I have access to clued0 and some cdf nodes)?

Cheers,
Philippe.

Hi,

Your files seems to tickle a but in TTree that has been in fixed in v5.22.

You will need to either load the original shared library or use ROOT 5.22 or higher.

Cheers,
Philippe.

Using v5.22 hadd did fix the problem. Now, hopefully processing will be faster. :slight_smile:

Thank you Rene and Philippe.