Hadd with Trees in Root 5.32 with a large number of files

tneddermann · March 21, 2012, 9:55pm

Hi,

I could neither find a similar topic on this board nor on Savannah.

I work with
OS: Ubuntu 10.10 64bit
Root 5.32 and 5.32.01

I have more than 1000 inputfiles which contain a tree in a subfolder, each with 10k entries and only one cycle number (= 1) (output from Geant4 simulations on a cluster)

To merge these files I do in the Shell

so I specify all input files.

What happens: (with for example 1331 input files)
923 files are processed in one step and then the rest (408).
hadd splits the jobs itself automatically in two steps

The result:
An output file with a tree with two cycle numbers is created:

cycle number 1 contains 9.23M entries (= 923 *10e3 )
so far so good
cycle number 2 contains only 4.08M entries
not 13.31M entries.

So the second cycle does not contain the events from the first cycle.

When you use the ‘-n XY’ option this gets even worse, then each cycle contains only its part of the data.

The only way I see to handle this at the moment is to keep the number of input file below the number, where hadd does the automatic splitting and do the recursive ‘merging’ by hand.

In 5.30.06 the merging of that many files is not possible, and has to be done recursively by hand (or a script)

This is quite uncomfortably if one likes to process the file afterwards automatically without taking care of cycle numbers.

Can anyone comment on that? Does it also occur on other systems?

Thanks.

Till

pcanal · March 23, 2012, 12:15pm

Hi Till,

Can you provide a couple of the input files so we can try to reproduce the problem?

Thanks,
Philippe.

tneddermann · March 23, 2012, 4:22pm

Hi,

please find the files at http://www.e4.physik.tu-dortmund.de/~tnedder/

hadd-input_400files.tar.gz - Inputfiles (subset)
hadd-input_1331files.tar.gz - Inputfiles (the mentioned 1331)
hadd-output_autosplit-bash.root - the outputfile with the above given command
hadd-output_n50-option.root - the outputfile with the “-n 50” option for hadd

So what I exactly did was

or

in a Gnome Terminal (Bash)

In addition I quickly did a check by generating 1201 files each with a tree with random data. Merging these files with

gives a file with a tree with only one cycle number and of course containing all entries.
The files for this test are
hadd-input_nosubdir_1201files.tar.gz - Inputfiles with tree, but not in subdir inside file
hadd-output_nosubdir.root - the resulting outputtree

If the problem occurs on other systems, this is a bug, from my point of view.

Thanks for testing,
Till

pcanal · March 23, 2012, 5:14pm

Hi Till,

I can indeed reproduce the problem with the subdirectories. I am working on a fix.

Cheers,
Philippe.

pcanal · March 23, 2012, 11:26pm

Hi Till,

Thanks for reporting this issue. It is fixed in revision 43473 of the trunk and in the v5.32 patch branch.

Cheers,
Philippes

tneddermann · March 26, 2012, 8:33am

Hi Philippe,

thanks for the help and the fast fixing of this issue.

Cheers,
Till