I’ve been doing some size comparisons between two identical (in terms of content and number of events) set of ROOT NTuples, one made with ROOT v6.36.02 and another made with v6.39.01 (compiled from master on 20/11-2025). They’re both created from another set of NTuples, originally made with 6.36.02, using the RDataFrame::Snapshot function.
I see some quite large changes of the sizes of most branches, in particular for empty branches (of type Array<float/int>). In v6.39.01 the empty vectors take up 2.9596 bytes/ev while in v6.36 only 1.4461 bytes/ev. Moreover for the other branches, which are not empty, there seems to be a size increase between 15 - 50% in the NTuples made with v6.39.01. Overall, including all the 129 branches of my NTuple the v6.39 one is 23% larger than the one made with 6.36.
Are these changes expected in later ROOT versions or is it because I’m mixing different ROOT version in the NTuple workflow?
Each set of NTuple has a total of 1 143 602 162 events and the total size is 336 GB (6.39) and 273 GB (6.36)
No, such a large increase is not expected. Could you guide me towards reproducing this behaviour, maybe with a script that creates only one file and so I can compare with the different versions of ROOT?
Attached (or here) is a simplified version of the full script which reproduces the problem. You can test it on this file on CERNBox (user/e/egramsta/forROOT/user.egramsta.48785775._000523.output_ntup.root).
When I run this script in ROOT 6.36.04 the size/ev is 261.37 bytes (total size 10 MB) while with the ROOT built from master on Nov. 20th 2025 (6.39.01) it is 331.96 bytes/ev (13 MB).
This forum post triggered an investigation into the effects of compression of TTree datasets like yours. One of the key characteristics of the input file you shared is that it has many branches of type RVec where many and often all of the vectors are actually empty. It turns out that in this particular scenario the TTree dataset is compressed better by the ZLIB algorithm (in particular the vanilla ZLIB implementation, that’s what’s available on the lxplus node, and not the other popular zlib-ng implementation that is available on many Linux systems like my workstation) than the ZSTD algorithm. This went completely against our prior knowledge and understanding.
I’m mentioning this because the issue you see is due to the change in the compression algorithm used by Snapshot (and in fact can be seen even without changing ROOT version and using 6.36 but just changing the compression settings). The default was changed in 6.38 after internal discussion following the available knowledge. This was indicated in the release notes at ROOT Version 6.38 Release Notes and it is also visible in your own script the first time you execute it with a ROOT version greater than 6.36 with the following message:
In ROOT 6.38, the default compression settings of Snapshot have been changed from 101 (ZLIB with compression level 1, the TTree default) to 505 (ZSTD with compression level 5). ...
So practically what you are seeing are the effects of the RDataFrame Snapshot compressing your data with ZSTD level 10 (ROOT compression setting 505, the default in 6.38) vs ZLIB level 1 (ROOT compression setting 101, the default before).