I am writing a code and I see that it takes more time to execute if I give the input as a bunch of .root files and TChain them. On the other hand let’s say I add the trees and make a single input file, it runs faster.
Does this reasoning sound right? If so, could anyone explain why this is happening?
does the file with the single tree use the same compression algorithm as the original files? (you can check with file yourfile.root, it should say something like “compression: 101” – and if the values differ between the original dataset and the new dataset that’s bound to change runtimes)
Thank you for the reply, So I didn’t add the whole of the 36 GB, but like half of it and that output file has a compression of 101. But the individual file has a compression of 1. So that different I suppose.
Uhm I’m not sure what 1 stands for, but this might be it. @pcanal what does a compression setting of 1 correspond to?
You can also add the files passing the -ff option to hadd, which tells it to use the same compression settings as the first of the input files – that should guarantee that the aggregated TTree is compressed with the same settings as the individual trees.
Now, a little runtime difference can be expected. The TChain has to do more work (at every entry, check whether it’s time to switch to a new file. open each new file and close the old one, etc.) – but as Ivan mentioned that difference should be small compared to the time spent in actual I/O and data processing. What runtimes are we talking about (after using hadd -ff)?
If you are using hadd and want to make sure there is no unexpected decompression/recompression use the option -fk:
-fk Sets the target file to contain the baskets with the same compression
as the input files (unless -O is specified). Compresses the meta data
using the compression level specified in the first input or the
compression setting after fk (for example 206 when using -fk206)