Reoptimize cluster size with hadd

beojan · February 18, 2020, 5:42pm

Is there an easy way to reoptimize the cluster size for a TTree, say with hadd, if you have a TTree with a poorly chosen cluster size?

In my case, I have files where each 10 - 50 GB tree is a single cluster, but I think this would apply equally well if you have one cluster per entry.

Wile_E_Coyote · February 18, 2020, 5:47pm

Try to execute: hadd --help

eguiraud · February 18, 2020, 5:49pm

Hi,
I had some luck with hadd -O before.

Cheers,
Enrico

beojan · February 19, 2020, 7:59am

I did. All I could find was hadd - O to reoptimize basket size, but as far as I could tell, baskets and clusters aren’t the same thing.

Wile_E_Coyote · February 19, 2020, 8:38am

What’s a “cluster”?
Have you used TTree::SetAutoFlush when creating the tree?

beojan · February 19, 2020, 9:53am

Just tried it, and the output still puts the entire tree in a single cluster.

eguiraud · February 19, 2020, 10:23am

Ok I think @pcanal is the one that can help

pcanal · February 19, 2020, 8:09pm

Are you saying that you called SetAutoFlush when creating the TTree, did a ‘slow’ cloning and/or fill fresh and there is still only a single cluster?

I have files where each 10 - 50 GB tree is a single cluster

This seems improbable (a cluster has too eventually fit in memory). How do you assert the number of clusters?

beojan · March 2, 2020, 4:34pm

Apologies for the late reply.

I find the number of clusters by doing tree->Print("clusters");.

Are you saying that you called SetAutoFlush when creating the TTree, did a ‘slow’ cloning and/or fill fresh and there is still only a single cluster?

No, I mean I did hadd -O .... It looks like this doesn’t reoptimize the cluster size.

pcanal · March 2, 2020, 7:44pm

I find the number of clusters by doing tree->Print("clusters"); .

Note that an output like this:

root [2] CollectionTree->Print("clusters")
******************************************************************************
*Tree    :CollectionTree: CollectionTree                                         *
*Entries :     4000 : Total =      3934402084 bytes  File  Size = 1070681137 *
*        :          : Tree compression factor =   3.68                       *
******************************************************************************
Cluster Range #  Entry Start      Last Entry        Size
0                0                3999               100

means that the cluster size from entry 0 to entry 3999 is 100 entries and hence there is 40 clusters.

Another form:

root [1] ntuple->Print("clusters")
******************************************************************************
*Tree    :ntuple    : Demo ntuple                                            *
*Entries :    25000 : Total =          504176 bytes  File  Size =     400840 *
*        :          : Tree compression factor =   1.25                       *
******************************************************************************
Cluster Range #  Entry Start      Last Entry        Size
0                0                24999            -30000000

indicates that the cluster size was not reached and there is indeed only one cluster.

beojan · March 3, 2020, 10:19am

Ah, I see. In that case, there’s no problem, I have multiple clusters.

pcanal · March 3, 2020, 12:55pm

Since you are not the first one to mis-interpret the output, I opened" https://sft.its.cern.ch/jira/browse/ROOT-10594#

system · March 17, 2020, 12:55pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.