TTree and default cache size

rsawada · December 28, 2009, 11:53am

Hi.

I have a question on cache size of TTree.
A new scheme was introduced at 5.26 and I am trying to understand how it works.

As far as I understand, TTree::fAutoFlush is -30000000 when a tree is created. When the tree is written, TTree::fAutoFlush is replaced by number of entries when the size becomes -TTree::fAutoFlush. Then when the tree is read from the file, TTree::fCacheSize is set to be TTree::fAutoFlush, which is number of entries instead of size of memory. To set TTree::fCacheSize to be memory size, one has to call TTree::SetCacheSize(). So, basically, I should call TTree::SetCacheSize() before starting interactive analysis (e.g. Draw()), if I want to use reasonable size of cache.

Is it correct ?

Ryu

brun · January 1, 2010, 8:29pm

Correct. This is described in the 5.26 release notes

Rene

rsawada · January 4, 2010, 6:48pm

I was wondering why the cache size just after opening a file for reading is so small, but not zero.

Now I understand. Thank you.

By the way, I have another related topic on tree and memory.
It seems to me that the current TTree::OptimizeBasket can allocate a lot of memory when compression factor is large, because the maximum memory size is defined by the total-bytes when zip-bytes reach a limit (30 MB by default).

I attached a sample macro. You can run like,
(A) root tree3.C+(1,1) <-- Without filling values, so compression factor is large
(B) root tree3.C+(1,0) <-- With filling values, so compression factor is normal
This macro has a huge array(2MB) as a branch.
Optimized basket size of the huge branch is 37 MB for (B), and it is 233MB for (A).
If you have 5 of this kind of branches, it could be a problem.
During the compression, almost the the same size of memory is actually used by the system.
When you increase number of such branches, total memory for all the baskets don’t increase, in the case of (B), because number of entries when OptimizeBasket is called will be smaller. However, in the case of (A), memory will be almost linearly increased as you add these branches.

I tried svn-trunk rev.31948.

In the release note of 5.26, it is written as “When the amount of data written so far (fTotBytes) is greater than fAutoFlush (see SetAutoFlush) all the baskets are flushed to disk”, and it is wrong.
It is called when fZipBytes is greater than fAutoFlush.
(BTW, there is another small mistake on modified default fAutoSave size. It must be 300MB.)

So, it might be better to add some protection. For example, limiting ‘maxMemory’ of OptimizeBasket to be smaller than fMaxVirtualSize, and setting fMaxVirtualSize in the constructor would work.
(I haven’t thought well if there can be some mismatch of optimum basket size when writing and reading. Maybe some adjustment of positive fAutoFlush is needed.)

I am not intending to complain on the optimization.
I like the new optimization scheme, and reuse of it when reading. It is nice.
And, I know my example is an extreme case, and very rare.
Making such a big empty branch, and using the optimization-scheme might be a user’s fault.
If a user knows the reason of the error, there are many solutions, like

monitoring total-size and call OptimizeBasket in one’s event loop with reasonable size
just avoiding to make a branch if one knows the branch will not be filled in advance
disabling optimization when filling.
using a variable size array.
and so on.
But, many users may not find the reason of allocation of huge memory.

Ryu
tree3.C (3.44 KB)

brun · January 5, 2010, 9:37am

Thanks for this remark and your simple example. Your analysis is correct.
I have protected TTree::OptimizeBaskets in the SVN trunk against cases like in your example.

Rene

rsawada · January 5, 2010, 10:19am

Thank you

Ryu