Reoptimize cluster size with hadd

Is there an easy way to reoptimize the cluster size for a TTree, say with hadd, if you have a TTree with a poorly chosen cluster size?

In my case, I have files where each 10 - 50 GB tree is a single cluster, but I think this would apply equally well if you have one cluster per entry.

Try to execute: hadd --help

Hi,
I had some luck with hadd -O before.

Cheers,
Enrico

I did. All I could find was hadd - O to reoptimize basket size, but as far as I could tell, baskets and clusters aren’t the same thing.

What’s a “cluster”?
Have you used TTree::SetAutoFlush when creating the tree?

Just tried it, and the output still puts the entire tree in a single cluster.

Ok I think @pcanal is the one that can help

Are you saying that you called SetAutoFlush when creating the TTree, did a ‘slow’ cloning and/or fill fresh and there is still only a single cluster?

I have files where each 10 - 50 GB tree is a single cluster

This seems improbable (a cluster has too eventually fit in memory). How do you assert the number of clusters?

Apologies for the late reply.

I find the number of clusters by doing tree->Print("clusters");.

Are you saying that you called SetAutoFlush when creating the TTree, did a ‘slow’ cloning and/or fill fresh and there is still only a single cluster?

No, I mean I did hadd -O .... It looks like this doesn’t reoptimize the cluster size.

I find the number of clusters by doing tree->Print("clusters"); .

Note that an output like this:

root [2] CollectionTree->Print("clusters")
******************************************************************************
*Tree    :CollectionTree: CollectionTree                                         *
*Entries :     4000 : Total =      3934402084 bytes  File  Size = 1070681137 *
*        :          : Tree compression factor =   3.68                       *
******************************************************************************
Cluster Range #  Entry Start      Last Entry        Size
0                0                3999               100

means that the cluster size from entry 0 to entry 3999 is 100 entries and hence there is 40 clusters.

Another form:

root [1] ntuple->Print("clusters")
******************************************************************************
*Tree    :ntuple    : Demo ntuple                                            *
*Entries :    25000 : Total =          504176 bytes  File  Size =     400840 *
*        :          : Tree compression factor =   1.25                       *
******************************************************************************
Cluster Range #  Entry Start      Last Entry        Size
0                0                24999            -30000000

indicates that the cluster size was not reached and there is indeed only one cluster.

Ah, I see. In that case, there’s no problem, I have multiple clusters.

Since you are not the first one to mis-interpret the output, I opened" https://sft.its.cern.ch/jira/browse/ROOT-10594#

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.