Hadd performance question

Hello,

I am trying to merge 500 files with ~3 MB size per file and each containing a few thousand histograms and one TTree. I have noticed following:

– hadd completes file directory scan within ~15 minutes;

– hadd writes target file with ~150 MB size;

– Then there is nothing visible happening for more than a half an hour after that: output file is not modified and there are no messages. It seems that the target file is complete when I try to browse its content.

– For the past 30 minutes, hadd memory footprint stays constant at 3.7 GB while it is consuming 100 % CPU without any messages or updates to the target file.

I am wondering if this last step of hadd is necessary since it takes so much time? Is hadd just trying to clean up objects in memory?

I am running 6.14/06 version compiled with gcc 7.3.0 on Ubuntu 18.04. I can post the input files for tests if this helps (but it seems that this is a general hadd question).

Thank you,
Rustem


ROOT Version: Not Provide
Platform: Not Provided
Compiler: Not Provided


As it looks like an I/O question I guess @pcanal can help you.

Yes, it is cleanup memory (and that clean-up does not scale well with the number of histogram). We are planning to add an option to hadd to skip that work,

Cheers,
Philippe.

Hi Philippe,

Thank you for your reply.

I would vote in favour of adding the option to skip this step! I think that in a typical use case this step is not needed as OS should clean up the memory anyway when hadd exits.

Cheers,
Rustem

IIRC we were discussing an option to not skip this step - but skip it by default because it’s such a common case :slight_smile:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Hi,

I finally managed to go through hadd is to consistently delete the objects and directories ‘right’ after use (rather than accumulating them in memory) and reduce the amount of cleanups needed. This should resolved all (known) performance issues with hadd. The fix is (will be) available in v6.20/00 and 6.18/06.

Cheers,
Philippe.