Hadd parallelization does not respect compression settings

When using the parallelization options in hadd, it appears that the sub-processes do not respect the compression flags.


I set -ff to ensure the target and source files have the same compression. Without parallelization, this appears to work just fine:

$ hadd -ff test.root ...
hadd Target file: test.root
hadd compression setting for all output: 509
hadd Source file 1: ...
hadd Source file 2: ...
hadd Source file 3: ...
hadd Source file 4: ...
hadd Source file 5: ...
hadd Source file 6: ...
hadd Target path: test.root:/
...

With parallelization, I get warnings that the target and sources have different compression levels:

$ hadd -ff -j 2 test.root ...
Parallelizing  with 2 processes.
hadd Target file: test.root
hadd compression setting for all output: 509
hadd Source file 1: ...
hadd Source file 1: ...
hadd Source file 2: ...
hadd Source file 2: ...
hadd Source file 3: ...
hadd Source file 3: ...
hadd Sources and Target have different compression levels
hadd merging will be slower
hadd Target path: /tmp/partial1_f870e6d4-5b07-11ef-b8af-4f0b2c0abeef.root:/
hadd Sources and Target have different compression levels
hadd merging will be slower
hadd Target path: /tmp/partial0_f870e6d4-5b07-11ef-b8af-4f0b2c0abeef.root:/
...

From this, I infer that the compression settings are not being forwarded to the subprocesses used by hadd.

Hi,

Thanks for the interesting post.

What ROOT version are you using? Is it possible for you to share (a few of )your input files as well as the exact commands used for the merging?

Cheers,
Danilo

Sorry, here is a full reproducer:

demo.py:

import os

import ROOT as r

print("ROOT version:", r.gROOT.GetVersion())

print(
    "default compression level: ", r.RCompressionSetting.EDefaults.kUseCompiledDefault
)
explicit_compression_level = 509
assert (
    r.RCompressionSetting.EDefaults.kUseCompiledDefault != explicit_compression_level
), (r.RCompressionSetting.EDefaults.kUseCompiledDefault, explicit_compression_level)

flist = []
for i in range(6):
    h = r.TH1F("h", "h", 100, -5, 5)
    h.FillRandom("gaus", 1000)
    fname = f"test{i}.root"
    with r.TFile.Open(fname, "recreate", f"file {i}", explicit_compression_level) as f:
        f.WriteObject(h, "h")
    flist.append(fname)
    h.Delete()

print("no parallelization:")
os.system("hadd -ff test.root " + " ".join(flist))

print("with parallelization:")
os.system("hadd -ff -j 2 test.root " + " ".join(flist))

command:

python -u demo.py &> demo.log

demo.log:

ROOT version: 6.32.02
default compression level:  101
no parallelization:
hadd Target file: test.root
hadd compression setting for all output: 509
hadd Source file 1: test0.root
hadd Source file 2: test1.root
hadd Source file 3: test2.root
hadd Source file 4: test3.root
hadd Source file 5: test4.root
hadd Source file 6: test5.root
hadd Target path: test.root:/
with parallelization:
Parallelizing  with 2 processes.
hadd Target file: test.root
hadd compression setting for all output: 509
hadd Source file 1: test3.root
hadd Source file 2: test4.root
hadd Source file 3: test5.root
hadd Sources and Target have different compression levels
hadd merging will be slower
hadd Target path: /tmp/partial1_7bb96f02-5b35-11ef-b950-4f0b2c0abeef.root:/
hadd compression setting for all output: 509
hadd Source file 1: test0.root
hadd Source file 2: test1.root
hadd Source file 3: test2.root
hadd Sources and Target have different compression levels
hadd merging will be slower
hadd Target path: /tmp/partial0_7bb96f02-5b35-11ef-b950-4f0b2c0abeef.root:/
hadd compression setting for all output: 509
hadd Source file 1: /tmp/partial0_7bb96f02-5b35-11ef-b950-4f0b2c0abeef.root
hadd Source file 2: /tmp/partial1_7bb96f02-5b35-11ef-b950-4f0b2c0abeef.root
hadd Sources and Target have different compression levels
hadd merging will be slower
hadd Target path: test.root:/

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.