Use TTree Fill to copy some events but the new tree uses more memory

Hello,

I want to select some events from an “Events” tree from a NanoAOD file and copy them to an “SelectedEvents” Tree in a new root file. I expect the “SelectedEvents” tree to have exactly the same structure and and branches as the original file. I did a quick test that if I directly use newTree->Fill() when I loop through all the events without any selection, the new root file uses 50% more memory than the original root file. I think the problem is related to the new tree’s compression level and baskets, but I have no idea how to change it.

As an example, a branch in the old tree has:
Entries : 21000
Total Size = 21888 bytes
File Size = 740
Baskets : 4
Basket Size = 9728 bytes
Compression = 28.84

While the same branch in the new tree has:
Entries : 21000
Total Size = 21802 bytes
File Size = 444
Baskets : 4,
Basket Size = 9728 bytes
Compression = 47.88

If I use the TTree CloneTree function, with CloneTree(-1,“fast”) option, I can reproduce the “SelectedTree” with exactly the same size as the original tree, however, there is no option for me to only copy the events that I want to select.

Please let me know if there is a way to select events and at the mean time keep the file size small.

Thank you,
Yao


Please read tips for efficient and successful posting and posting code

ROOT Version: 6.22/09
Platform: bash linux
Compiler: g++


I think @pcanal can help you with this question.

Hi @Yao_Yao ,

I assume you mean that reading the new tree after it was produced requires more RAM than the old tree. That’s probably because of the better compression ratio: by default, ROOT I/O reads roughly ~30MB of compressed data at a time, and then has to decompress them and put the result in a staging area. with a larger compression factor, the decompressed data will occupy more memory.

You can fix this by tweaking the compression algorithm and settings for the file in which the TTree is stored. It’s the fourth argument of TFile::Open: ROOT: TFile Class Reference .

Another thing you can do is change the TTree AutoFlush setting so that it writes smaller clusters (a “cluster” of events is a group of events that are compressed together), which will result in smaller decompressed size when reading.

I hope this helps!
Philippe might have further suggestions.

Cheers,
Enrico

Hi Enrico,

I mean the size of the file goes from 22 MB(old file) to 31 MB(new file) after I fill all the entries to a new tree in a new file. My understanding is that I probably does not compress it in a good way as the old root file did.

I tried what you suggested, changed the ROOT::RCompressionSetting::EDefaults::EValues to other 4 different options but they do not help.

I also tried AutoFlush, by adding newEventTree->SetAutoFlush() and set the number to 10, (which leads to a 400 MB file), to -200000 (does not change the size of the new root file), and 0 (also does not change much).

Please let me know if you have more ideas. Also I am not familiar with how root manage the buffer of each branch, so I will try understand more of that.

Thank you,
Yao

Try:
root -l -q -b some_file.root -e 'std::cout << gFile->GetCompressionSettings() << "\n";'

The result of compression settings is 209 for the input file (nanoAOD file) and 101 for the output file.

Using the same compression settings for input and output should produce files with similar/identical size.
@pcanal can correct me if I’m wrong, but it should be enough to pass the desired compression settings when constructing the output file. It’s the fourth argument of TFile::Open, see the docs.

Cheers,
Enrico

It works. So I get the compression number from the Input root file, and use it as the fourth argument when opening the new root file. Thank you for your help!

By the way, a quick question, what does the compression number represent? Is it something that can be set to any number I want? I assume it affects the writing speed and reading speed for the next step, but do you have roughly a guess (or a chart) on how much does it co-related to the compression number?

The digit in the hundreds is a code for the algorithm used, the units are the compression level for that algorithm (the higher the more you are asking a certain algorithm to compress). The different algorithms are here: ROOT: ROOT::RCompressionSetting::EAlgorithm Struct Reference .

P.S.
here is a short rundown of the different algorithms

Cool. Thanks. :smiley:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.