Problems with hadd in lxplus

Hi there,

I am trying to merge files that contain multiple trees and histograms, some of the branches contains long strings and some of the trees are not always filled.

To avoid the strings being truncated, the file with the largest string must be listed first in the merging, however there are still issues seen when running with no compression (i.e. hadd -f0 … ) on lxplus when some trees have no entries :

free(): invalid next size (fast)
Aborted (core dumped)

There are also failures when merging batches of files at a time and merging the final outputs:

“Error in < TNetXNGFile::Open >: [ERROR] Server responded with an error: [3001] Required argument not present”

Is there any advice on how best to deal with merging of such files?

Dear @root_userj ,

Thanks for reaching out to the forum! Your use case seems quite specific, could you provide a code reproducer and a couple of files showing the different features you cite (long strings, empty trees etc).

Cheers,
Vincenzo

Hi @vpadulan,

Thank you for your response, can I send these files privately?

Even hadd -f0 merged_file.root file1.root file2.root … seems to result in issues in some cases.

Dear @root_userj ,

Yes sure, feel free to share them privately in any manner it is convenient for you.

Cheers,
Vincenzo

Hi @vpadulan,

I have emailed a copy of the files.

Cheers

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Hi @vpadulan,

I am just following up to see if there is a preferred method of sharing these files?

Thanks!

hadd has some issues such as [ROOT-4716] TTree merging problems when including empty trees · Issue #14558 · root-project/root · GitHub or Issue with `hadd` when first file has empty tree · Issue #12510 · root-project/root · GitHub and `hadd` segfaults when the output file is too large · Issue #10102 · root-project/root · GitHub

You can also share your files in a public link wherever you like.

Hi @ferhue ,

Thanks for the info, I will look at where best to share these histograms.

Hi @ferhue ,

some files that can show the issues can be found in CERNBox.

Sorry, I don’t work at CERN so I can’t access those, but maybe vepadulano can check them. Please also post what command you are using exactly with those.

Hi @ferhue ,

If @vpadulan does not respond, I may be able to directly email the files to you?

The command that is currently being used is just :

hadd -f0 -O merged.root files

Cheers

I can see your files on CERNBOX. I am sure @vpadulan will acces them and help you.

I tried on m=Mac with this script:

File merge.sh:

set -x
hadd -f0 -O merged.root \
data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0001.3.root \
data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0002.3.root \
data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0003.3.root \
data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0004.3.root \
data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0005.3.root \
data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0006.3.root \
data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0007.3.root \
data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0008.3.root \
data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0009.3.root \
data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0010.3.root \
data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0011.3.root \
data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0012.3.root \
data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0013.3.root

I get this:

% sh merge.sh   
+ hadd -f0 -O merged.root data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0001.3.root data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0002.3.root data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0003.3.root data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0004.3.root data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0005.3.root data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0006.3.root data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0007.3.root data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0008.3.root data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0009.3.root data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0010.3.root data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0011.3.root data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0012.3.root data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0013.3.root
hadd Target file: merged.root
hadd compression setting for all output: 0
hadd Source file 1: data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0001.3.root
hadd Source file 2: data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0002.3.root
hadd Source file 3: data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0003.3.root
hadd Source file 4: data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0004.3.root
hadd Source file 5: data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0005.3.root
hadd Source file 6: data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0006.3.root
hadd Source file 7: data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0007.3.root
hadd Source file 8: data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0008.3.root
hadd Source file 9: data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0009.3.root
hadd Source file 10: data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0010.3.root
hadd Source file 11: data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0011.3.root
hadd Source file 12: data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0012.3.root
hadd Source file 13: data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0013.3.root
hadd Target path: merged.root:/

it seems ok.

Hi @couet ,

Thanks for looking, yes this seems to work ok on the Mac OS but inconsistencies are seen when running on lxplus for example.

Cheers

Which ROOT version are you using on lxplus (on my Mac it is the latest master) ? if you use the same exact command I posted on the same set of files you provided to you also get the core dump ?

Either root 6.30.02-x86_64-el9-gcc13-opt or 6.28/12

The errors seen are for example :

double free or corruption (out) or
free(): invalid size or
free(): invalid next size (fast)

Try running with the ordering also :

hadd -f0 -O mergedfile_tmp_1.root data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0008.3 data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0004.3 data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0002.3 data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0011.3 data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0010.3 data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0006.3 data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0013.3 data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0012.3 data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0009.3 data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0007.3 data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0005.3 data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0003.3 data24_13p6TeV.00473317.debugrec_hlt.reproc.HIST_DEBUGSTREAMMON.g69._0001.3

I built root on lxplus and ran the script I created on your files and I also get the error. May be one of the issue mentioned by @ferhue may explain the problem ?

Hi @couet,

Thanks for having a look - to counteract any issues with empty trees the no compression and then re-optimise basket size was added, this works for the majority of files but not in this case and I am not sure why.

The files are quite small (~20kB) so I would be surprised if there was an issue with file size.

It is not always consistent when the job fails - are there any flags that could be set to help with this?

Cheers