Large changes in branch sizes in later builds of ROOT

Dear ROOT experts,

I’ve been doing some size comparisons between two identical (in terms of content and number of events) set of ROOT NTuples, one made with ROOT v6.36.02 and another made with v6.39.01 (compiled from master on 20/11-2025). They’re both created from another set of NTuples, originally made with 6.36.02, using the RDataFrame::Snapshot function.

I see some quite large changes of the sizes of most branches, in particular for empty branches (of type Array<float/int>). In v6.39.01 the empty vectors take up 2.9596 bytes/ev while in v6.36 only 1.4461 bytes/ev. Moreover for the other branches, which are not empty, there seems to be a size increase between 15 - 50% in the NTuples made with v6.39.01. Overall, including all the 129 branches of my NTuple the v6.39 one is 23% larger than the one made with 6.36.

Are these changes expected in later ROOT versions or is it because I’m mixing different ROOT version in the NTuple workflow?

Each set of NTuple has a total of 1 143 602 162 events and the total size is 336 GB (6.39) and 273 GB (6.36)

best,

Eirik

Dear @Eirik_Gramstad ,

Thank you for reaching out to the forum!

No, such a large increase is not expected. Could you guide me towards reproducing this behaviour, maybe with a script that creates only one file and so I can compare with the different versions of ROOT?

Cheers,
Vincenzo

Hi Vincenzo

Attached (or here) is a simplified version of the full script which reproduces the problem. You can test it on this file on CERNBox (user/e/egramsta/forROOT/user.egramsta.48785775._000523.output_ntup.root).

When I run this script in ROOT 6.36.04 the size/ev is 261.37 bytes (total size 10 MB) while with the ROOT built from master on Nov. 20th 2025 (6.39.01) it is 331.96 bytes/ev (13 MB).

LargeSizeDemoScript.py (1.5 KB)

best,

Eirik

Dear @Eirik_Gramstad ,

Thanks for the reproducer, as a first test I compiled on my machine different versions of ROOT and ran your script:

ROOT master

Total                                                                332.47
Stored 125 branches in data15_63901.root with 38247 events
File Size : 12.72 MB

ROOT 6.36.10

Total                                                                342.41
Stored 125 branches in data15_63610.root with 38247 events
File Size : 13.10 MB

ROOT 6.36.04

Total                                                                342.41
Stored 125 branches in data15_63604.root with 38247 events
File Size : 13.10 MB

ROOT 6.36.02

Total                                                                342.41
Stored 125 branches in data15_63602.root with 38247 events
File Size : 13.10 MB

So on my machine, all 6.36.* versions have a slightly larger bytes/event metric than current master.

On the other hand, I ran your reproducer also on lxplus and I see very different results:

LCG 109 - ROOT 6.38.00

Total                                                                331.98
Stored 125 branches in data15_63800.root with 38247 events
File Size : 12.70 MB

LCG108a - ROOT 6.36.04

Total                                                                261.37
Stored 125 branches in data15_63604.root with 38247 events
File Size : 10.00 MB

LCG108 - ROOT 6.36.02

Total                                                                261.37
Stored 125 branches in data15_63602.root with 38247 events
File Size : 10.00 MB

So here I see a behaviour such as the one you describe on your comment Large changes in branch sizes in later builds of ROOT - #3 by Eirik_Gramstad

I don’t know yet why ROOT 6.36 on my machine is behaving so differently from ROOT 6.36 on lxplus. This will need investigation.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Dear @Eirik_Gramstad ,

This forum post triggered an investigation into the effects of compression of TTree datasets like yours. One of the key characteristics of the input file you shared is that it has many branches of type RVec where many and often all of the vectors are actually empty. It turns out that in this particular scenario the TTree dataset is compressed better by the ZLIB algorithm (in particular the vanilla ZLIB implementation, that’s what’s available on the lxplus node, and not the other popular zlib-ng implementation that is available on many Linux systems like my workstation) than the ZSTD algorithm. This went completely against our prior knowledge and understanding.

I’m mentioning this because the issue you see is due to the change in the compression algorithm used by Snapshot (and in fact can be seen even without changing ROOT version and using 6.36 but just changing the compression settings). The default was changed in 6.38 after internal discussion following the available knowledge. This was indicated in the release notes at ROOT Version 6.38 Release Notes and it is also visible in your own script the first time you execute it with a ROOT version greater than 6.36 with the following message:

In ROOT 6.38, the default compression settings of Snapshot have been changed from 101 (ZLIB with compression level 1, the TTree default) to 505 (ZSTD with compression level 5). ...

So practically what you are seeing are the effects of the RDataFrame Snapshot compressing your data with ZSTD level 10 (ROOT compression setting 505, the default in 6.38) vs ZLIB level 1 (ROOT compression setting 101, the default before).

The details of the full investigation are available at GitHub - vepadulano/ttree-lossless-compression-studies: This is a collection of programs to study the behaviour of different compression algorithms used by ROOT to compress datasets in the TTree format. · GitHub

Following this, I have opened a PR to revert the Snapshot behaviour in light of the new knowledge Revert choice to change default Snapshot TTree compression settings by vepadulano · Pull Request #21753 · root-project/root · GitHub

In the meanwhile, you could try with a quick workaround by setting explicitly the compression settings for your Snapshot calls, e.g.:

opts = ROOT.RDF.RSnapshotOptions()
opts.fCompressionAlgorithm = ROOT.RCompressionSetting.EAlgorithm.EValues.kZLIB
opts.fCompressionLevel = 1
df.Snapshot(output_treename, output_filename, columns_list, opts)

Cheers,
Vincenzo