Test memory when filling a TTree

Adrian · April 20, 2016, 2:55pm

Hi all,

I’m working with a TTree which is memory resident. This TTree is filled until a Write() function is called to save the data in a file on disk. The TTree is reset and filled again. And so on…

I cannot know how many entries the TTree will have before the Write() function is called. Therefore the size of the tree can exceed the size of memory. How do you suggest I test this to prevent a crash?

Thank you.

pcanal · April 20, 2016, 3:35pm

Hi,

One obvious solution is to attach the TTree to a file, possibly increasing the basket size to hold more in memory at once.

As far as your direct question, you could monitor the value of TTree::GetTotBytes

Cheers,
Philippe.

Adrian · April 20, 2016, 4:00pm

Hi,

There are several reasons why I’m using a memory-resident tree:
1/ The name of the file will depend on what is in the TTree. Therefore I can only determine the name of the ROOT file when I write the tree to disk.
2/ isn’t it faster to fill a TTree in memory rather than a tree attached to a file? in particular, the tree is re-indexed before being saved. Having a fast processing is crucial for me.
3/ The tree is created in a constructor of some class of mine. Many other ROOT objects are created in that class (TH1, TGraph…). I’m not sure but isn’t it a problem when you close the file? The other objects are deleted.

Can you tell me more about TTree::GetTotBytes? This function is not documented.

Thank you

pcanal · April 20, 2016, 4:11pm

Hi Adrian,

[quote]2/ isn’t it faster to fill a TTree in memory rather than a tree attached to a file? in particular, the tree is re-indexed before being saved. Having a fast processing is crucial for me.[/quote]That depends. In normal circunstances in aggregate the time “memory-TTree + write-to-file” whould the same as “attached-TTree-written-inline”. Of course the main difference is ‘when’ the time to ‘write-to-disk’ is spent. (I.e in the attached case, some of the Fill will take longer as they will need to zip (if requested) and write to disk (fast)). Especially if you do not compress the data, I would say that the performance different (due in part to disk caching happening on the OS side) ‘should’ be minor. (but see next question)

[quote] in particular, the tree is re-indexed before being saved. [/quote]What do you mean by re-indexed?

[quote]Can you tell me more about TTree::GetTotBytes? This function is not documented.[/quote]It simply return the value of Long64_t fTotBytes; // Total number of bytes in all branches before compressionwhich is your case should be a slight underestimate of the amount of memory used by the TTree. The actual number will be higher since the last basket will have extra memory reserve for the next (few) entries.

Cheers,
Philippe.

Adrian · April 20, 2016, 4:30pm

I mean that TTree entries are sorted using the BuildIndex() function. Then a new TTree is filled with the sorted entries. All of this is done in RAM. For details, see an old post of mine:
[url]Create a new index for a TTree - #8 by Adrian
I adopted the strategy given by Wile E. Coyote.

If you have a better approach, I’d be happy to hear it.

Cheers.

pcanal · April 20, 2016, 4:40pm

Hi,

Indeed, the need to ‘sort’ the entries means that you either need a later re-ordering stage [i.e. you could have a post-processor that takes a file, read it and write it back in the right order)] or to keep the data in memory until there is enough information to sort.

If your object have a small amount of transient data, the in-memory size of the objects and the in-memory size of their representation in a TTree would be similar. So another option would be to keep the object themselves in memory in a (sorted) collection and then when you are ready to write them into a TTree attached to a file. This would be slightly be efficient as the scheme you have; “write into a tree, calculate sort, read back from tree, write into another tree”, ‘waste’ run-time in the boxing/unboxing to read in and out of the TTree. On the ‘downside’, to control memory you would have to find a way to evaluate the in-memory size of your objects.

Cheers,
Philippe.

Adrian · April 20, 2016, 5:08pm

Thank you for your help on this.

BTW, I tried the GetTotBytes() function and it always returns 0 while I filled my tree with 1 million entries. How is it possible?

pcanal · April 20, 2016, 7:10pm

Hi Adrian,

My apologies, I had forgotten that fTotBytes is counting only the data that has been written to disk.

The best in your case is to actually accumulate the return value(s) of your call to TTree::Fill (which returns the number of (uncompressed) bytes written into the baskets during that call to Fill).

Cheers,
Philippe.