Question on TBasket implementation

kirillskovpen · June 26, 2010, 9:49am

Hello dear ROOTers,

I would like to ask some general question on how the creation of TBaskets is implemented in ROOT.
I have noticed the following explanation (at the end of the first chapter):

root.cern.ch/drupal/content/spin … -disk-spin

I just wanted to make sure that I understand the machinery correctly.

Here is what I am doing:

I have two input files (POOL) with different number of events (i.e. 10^6 and 100) which I am using to create my ROOT ntuples. After the processing the both two output ROOT ntuples will have the same number of events (100) and should be identical because those two files are actually the same data but one of them is a “skimmed” version of another. And, actually, the output files are indeed identical, but the only difference is the number of baskets written in the output files. For the ROOT file which comes from the input file with bigger size the number of baskets is greater than for the input file with 100 events. Everything else (values, number of events, etc.) is the same in output ROOT ntuples.

I think that the explanation could be that in case of greater input statistics there is an intense usage of memory which results in the creation of a greater number of baskets for branches (though, the number of written events is very small in both cases).

Could please someone comment on this ?

Anyway, it is still a “dark” question for me, what TBasket exactly represents. It is said that TBasket is the place for TBranch info to store but is there any “real” difference in files which have the same data but the number of baskets is different ?

Thank you very much in advance and sorry for “awkward” formulated questions,

Kirill

brun · June 28, 2010, 7:10am

As a minimum we need the shortest script reproducing your problem.

Rene

kirillskovpen · June 28, 2010, 7:29am

Thank you for responding Rene,

As a simple thing to start with I could provide you the result ROOT ntuples:

kskovpen.web.cern.ch/kskovpen/file_1.root
kskovpen.web.cern.ch/kskovpen/file_2.root

The first one corresponds to the processing of ~100 events,
the second one is for the case of much greater statistics.

Please do something like:

MakerD3PDemx->cd();
anaTree->Print();

to see how many baskets are there. There are 4 events in these files, but the first file has 2 baskets while the second one has 4 baskets (=number of events). Everything else is identical.
As for the “script” used, we actually have some separate private package in Athena, so it would be not so fast to reproduce it for you … Actually, my question is: does the number of baskets depend on the input number of processed events or not ? I mean if compare these two files, the only difference is the size, but there is no difference when plotting some histograms, reading branches, etc.

Thank you very much again,

Kirill

[quote=“brun”]As a minimum we need the shortest script reproducing your problem.

Rene[/quote]

brun · June 28, 2010, 8:18am

It looks like you are Flushing baskets at different intervals in the 2 cases. In principle with only 4 entries you should have only one baskert per branch. I need to see the program producing the output file.

Rene

kirillskovpen · June 28, 2010, 9:01am

Hello Rene,

I’d like to apologize for the noise, I’ve just realized what could be the reason for such behavior.

Please correct me if I am wrong. In the first case (2 baskets) I am merging two files to get file_1 and in the second case I am merging 4 files (which results in 4 baskets). I think this is the explanation and I am very sorry for not mentioning this from the start.

Thanks and sorry for disturbing you,

Kirill

[quote=“brun”]It looks like you are Flushing baskets at different intervals in the 2 cases. In principle with only 4 entries you should have only one baskert per branch. I need to see the program producing the output file.

Rene[/quote]

brun · June 28, 2010, 9:15am

OK this explains everything.

Rene