ROOT Version: 6.30.08
Platform: linuxx8664gcc
Compiler: g++ (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Hi,
I am trying to optimize the compressed size of a .root file and I’m confused about how TTree compression and key sizes interact.
Sometimes I see a large “key” for the same kind of data, but it varies across files. Below is the macro I use to list file keys and report the compressed tree size:
int size(const char* filename = "out.root") {
const double MB = 1.0e6;
TFile f(filename, "READ");
auto* t = (TTree*)f.Get("cbmsim");
std::cout << std::fixed << std::setprecision(3);
// List file keys and sum their sizes
double sum = 0.;
if (auto* keys = f.GetListOfKeys()) {
TIter it(keys);
while (auto* obj = it.Next()) {
auto* k = (TKey*)obj;
const double mb = k->GetNbytes() / MB;
std::cout << k->GetName() << ' ' k->GetClassName() << ' ' << mb << " MB\n";
sum += k->GetNbytes();
}
}
std::cout << " - - - - - - - - \n";
std::cout << "Sum of key sizes: " << (sum / MB) << " MB\n";
std::cout << "Tree size (compressed): " << static_cast<double>(t->GetZipBytes()) / MB << " MB\n";
std::cout << "File size : " << (static_cast<double>(f.GetSize()) / MB) << " MB\n";
return 0;
}
A typical output is this one
cbmout TFolder 0.000 MB
BranchList TListc 0.001 MB
TimeBasedBranchList TList 0.000 MB
FileHeader FairFileHeader 0.000 MB
cbmsim TTree 5.522 MB
cbmsim TTree 0.033 MB
- - - - - - - - - -
Sum of key sizes: 5.557 MB
Tree size (compressed): 61.341 MB
File size : 66.916 MB
Here, one cbmsim key is ~5.5 MB. (In other files of the same type it can be < 1 MB or > 10MB; I don’t see a clear pattern.)
I tried to get rid of the auto save
tree->SetAutoSave(0);
This removes the second (small) cbmsim key, as expected, but the large one remains.
Then I tried to set only a single cluster
tree->SetAutoFlush(0);
output:
cbmout TFolder 0.000 MB
BranchList TList 0.001 MB
TimeBasedBranchList TList 0.000 MB
FileHeader FairFileHeader 0.000 MB
cbmsim TTree 0.768 MB
- - - - - - - -
Sum of key sizes: 0.769 MB
Tree size (compressed): 66.807 MB
File size : 67.585 MB
Now the key is small (good), but the compressed tree size increased from ~61 MB to ~67 MB. I expected that with no auto flush (baskets written only when full) compression would stay the same or slightly improve, not get worse.
I am wrong somewhere but can’t figure out. To resume my problem I would ask:
- Why does the
TTreekey size vary so much between files of the same structure? - Why does
SetAutoFlush(0)can increase the compressed size of the tree? - Is there a recommended way/tooling to reduce key size without hurting the compressed tree?
Thanks,
Clement