I come up with a question concerning TTree streaming to disk and something we observed in some of our ntuples.
Our ntuples are processed in subsequential steps. Let’s say STEP1 and STEP2.
What STEP1 does, is to produce a ntuple + (nSTEP1 branches ) out of a compiled C++ code producing file1
In STEP2 we take the outcome of STEP1 and we attach new branches on TOP (nSTEP2) producing file2, this time using pyROOT and another source code.
According to this, one would expect file SIze of file2 is > than file1. However what we see is that size(file2) is 1/2 of size(file1).
What we observe in those 2 files on some common branches is that the Baskets value is very different on some very repetitive-value branches.
We checked and n-branches increased, as well as that all branches are sane and correctly propagated.
The only difference we observed is the nBaskets and compression level.
Therefore the question: is that possible that the compression is able to reduce by a factor 2 the size of an ntuple? From what does it depend and how can one explicitely check the reason of the file-size reduction?
A factor 2 from compression level sounds quite a lot but it is not impossible. You can use TTree::Print() in order to get detailed information about the overall compression ratio as well as the compression ratio of individual branches.
Hi @jblomer,
This is what i did infact.
It’s not just clear to me why in a C++ compiled code where we run
auto newTrree = (TTre*)oldTree.CopyTree( "CUT") ;
// DO STUFF on newTree to attach new tuples
newTree.Write( "", TObject::kOverwrite );
Is somehow behaving differently on when we do a similar thing using pyROOT.
I.e how does the compression get optimized in ROOT ? Is there any recommended way to have a good balance and ensure “file-size” remain roughtly constant whatever are the “processing” one does?