I am trying to merge 500 files with ~3 MB size per file and each containing a few thousand histograms and one TTree. I have noticed following:
– hadd completes file directory scan within ~15 minutes;
– hadd writes target file with ~150 MB size;
– Then there is nothing visible happening for more than a half an hour after that: output file is not modified and there are no messages. It seems that the target file is complete when I try to browse its content.
– For the past 30 minutes, hadd memory footprint stays constant at 3.7 GB while it is consuming 100 % CPU without any messages or updates to the target file.
I am wondering if this last step of hadd is necessary since it takes so much time? Is hadd just trying to clean up objects in memory?
I am running 6.14/06 version compiled with gcc 7.3.0 on Ubuntu 18.04. I can post the input files for tests if this helps (but it seems that this is a general hadd question).
Thank you,
Rustem
ROOT Version: Not Provide Platform: Not Provided Compiler: Not Provided
Yes, it is cleanup memory (and that clean-up does not scale well with the number of histogram). We are planning to add an option to hadd to skip that work,
I would vote in favour of adding the option to skip this step! I think that in a typical use case this step is not needed as OS should clean up the memory anyway when hadd exits.
I finally managed to go through hadd is to consistently delete the objects and directories ‘right’ after use (rather than accumulating them in memory) and reduce the amount of cleanups needed. This should resolved all (known) performance issues with hadd. The fix is (will be) available in v6.20/00 and 6.18/06.