beojan
1
Is there an easy way to reoptimize the cluster size for a TTree, say with hadd
, if you have a TTree with a poorly chosen cluster size?
In my case, I have files where each 10 - 50 GB tree is a single cluster, but I think this would apply equally well if you have one cluster per entry.
Try to execute: hadd --help
Hi,
I had some luck with hadd -O
before.
Cheers,
Enrico
beojan
4
I did. All I could find was hadd - O
to reoptimize basket size, but as far as I could tell, baskets and clusters aren’t the same thing.
What’s a “cluster”?
Have you used TTree::SetAutoFlush when creating the tree?
beojan
6
Just tried it, and the output still puts the entire tree in a single cluster.
Ok I think @pcanal is the one that can help
pcanal
8
Are you saying that you called SetAutoFlush when creating the TTree, did a ‘slow’ cloning and/or fill fresh and there is still only a single cluster?
I have files where each 10 - 50 GB tree is a single cluster
This seems improbable (a cluster has too eventually fit in memory). How do you assert the number of clusters?
beojan
9
Apologies for the late reply.
I find the number of clusters by doing tree->Print("clusters");
.
Are you saying that you called SetAutoFlush when creating the TTree, did a ‘slow’ cloning and/or fill fresh and there is still only a single cluster?
No, I mean I did hadd -O ...
. It looks like this doesn’t reoptimize the cluster size.
pcanal
10
I find the number of clusters by doing tree->Print("clusters");
.
Note that an output like this:
root [2] CollectionTree->Print("clusters")
******************************************************************************
*Tree :CollectionTree: CollectionTree *
*Entries : 4000 : Total = 3934402084 bytes File Size = 1070681137 *
* : : Tree compression factor = 3.68 *
******************************************************************************
Cluster Range # Entry Start Last Entry Size
0 0 3999 100
means that the cluster size from entry 0 to entry 3999 is 100 entries and hence there is 40 clusters.
Another form:
root [1] ntuple->Print("clusters")
******************************************************************************
*Tree :ntuple : Demo ntuple *
*Entries : 25000 : Total = 504176 bytes File Size = 400840 *
* : : Tree compression factor = 1.25 *
******************************************************************************
Cluster Range # Entry Start Last Entry Size
0 0 24999 -30000000
indicates that the cluster size was not reached and there is indeed only one cluster.
beojan
11
Ah, I see. In that case, there’s no problem, I have multiple clusters.
pcanal
12
Since you are not the first one to mis-interpret the output, I opened" https://sft.its.cern.ch/jira/browse/ROOT-10594#
system
Closed
13
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.