Branch baskets and their size

I’m trying to understand how a branch of a ROOT tree is stored on disk, and I have got two questions.

  1. I have a branch of a tree with 100 events which is made of following 7 baskets (in order of appearance on disk):

#0 - 16 events, 32000 B buffer size, 28938 B object length, 80 B key length - compression CX = 1.37 - size on disk 21200 B
#1 - 11 events, 32000 B buffer size, 29364 B object length, 80 B key length - compression CX = 1.37 - size on disk 21555 B
#2 - 15 events, 32646 B buffer size, 32634 B object length, 80 B key length - compression CX = 1.37 - size on disk 23958 B
#3 - 6 events, 32000 B buffer size, 26644 B object length, 80 B key length - compression CX = 1.36 - size on disk 19589 B
#4 - 12 events, 32000 B buffer size, 28362 B object length, 80 B key length - compression CX = 1.37 - size on disk 20799 B
#5 - 19 events, 39480 B buffer size, 39484 B object length, 80 B key length - compression CX = 1.37 - size on disk 28846 B
#6 - 21 events, 32000 B buffer size, 28670 B object length, 80 B key length - compression CX = 1.37 - size on disk 20922 B

As I presume the initial basket size during the branch creation was default 32000 bytes. Why could it be altered for baskets #2 and #5?
Is there a general rule that states until what limit baskets are filled with events? (Can you point to the source code please?)

  1. Is it true that basket size has to be large enough to hold at least one event of the branch? Is it obligatory, or just recommended?

[quote]Why could it be altered for baskets #2 and #5?
[/quote]In your case the content of the ‘event’ for that branch has a variable size. Given this variation, the tree can not known before starting the streaming whether the event will fit in the basket or not. Upon reach the limit (32K by default), instead of trying to copy the content (so far) to a newly allocated basket, we expand to the basket so that this event can fit. [quote]Is there a general rule that states until what limit baskets are filled with events? [/quote]They are filling until the system can either known that there is not enough space (in case of fixed size content) or the size remaining is less than the last event or the current event has gone past the requested size. (See the end of TBranch::Fill for the implementation).

As we have just seen, if the basket is not large enough to hold one event, the tree will expand the basket to be able to hold than one event. It is not obligatory to set the size to be large enough. However not doing so is a waste of resources (since the tree will need to reallocate the basket almost all the time (incurring the cost of memory allocation and memory copy).

Cheers,
Philippe.

Thank you very much for explanation. I have looked into the source code. I have a further question then.
If I can get the basket buffer size with
branch->GetBasket(k)->GetBufferSize()

which of the following commands shows the size of basket contents

branch->GetBasket(k)->GetBufferRef()->BufferSize()
branch->GetBasket(k)->GetObjlen()
branch->GetBasket(k)->GetLast()

Hi,

Out of curiosity, what is your interest in understanding the basket size so carefully?

[quote]which of the following commands shows the size of basket contents [/quote]The size of the basket content is “GetObjlen()”, however along side the content is stored some ‘meta-data’ and thus the total number of (uncompressed) bytes is GetObjlen()+GetKeylen() [The compressed size is GetNbytes()].

Cheers,
Philippe.

I’m trying to understand it, because I’m writing a paper on I/O access patterns of analysis tasks, whether some storage system parameters (like stripe size, block size) can be adjusted to yield the best possible performance.
When I have taken a look on basket sizes of different branches I’ve spotted some cases where the output of GetObjlen() is greater than GetBufferSize() like in example above
#5 - 19 events, 39480 B buffer size, 39484 B object length, 80 B key length - compression CX = 1.37 - size on disk 28846 B
(sometimes with more substantial difference)
Not that these details matter for my topic, but for the paper I’m trying to get the principle of filling baskets correct.
Thank you, Misha

[quote]#5 - 19 events, 39480 B buffer size, 39484 B object length, 80 B key length - compression CX = 1.37 - size on disk 28846 B [/quote]What do you print as the ‘buffer size’?

Philippe.

branch->GetBasket(k)->GetBufferSize()

Hi,

Could you send me the ROOT file you are looking at? I would like to understand better your case #5.

Thanks,
Philippe.

The File

Tree: esdTree
Basket #0 of branch Tracks.fP[5]
Basket #5 of branch Tracks.fIp
Basket #0 of branch Tracks.fTrackTime[5]

Hi,

There is also some ‘meta data’ at the end of the buffer (needed in particular for variable length records). Here is a summary of the meaning.

[code]branch->GetBasket(k)->GetBufferRef()->Length(): uncompressed Full length of the data and meta data stored (key length + user data object length + trailing meta data length).

branch->GetBakset(k)->GetLast(): key length + user data object length

branch->GetBasket(k)->GetBufferSize(): greater of default buffer size and value of GetLast()

branch->GetBasket(k)->GetObjlen(): user data object + trailing meta data length

branch->GetBasket(k)->GetKeylen(): key length

branch->GetBasket(k)->GetBufferRef()->BufferSize(): amount of memory allocated for the buffer.
[/code]

Cheers,
Philippe.

PS. I would be interested in reading your paper once it is available.

[Corrected the last line of the code snippet]

Thank you, now all the numbers fall into place. (small remark - there is a misprint in the last line of the code above, it has to be branch->GetBasket(k)->GetBufferRef()->BufferSize() )
Just to summarize for the record:
a size of a compressed basket on disk

a size of uncompressed basket in memory

branch->GetBasket(k)->GetBufferRef()->BufferSize() or alternatively branch->GetBasket(k)->GetBufferRef()->Length()