Using TTree->SetCacheSize()

Dear All,

together with my collegues I’m trying to use the TTree->SetCacheSize(cachesize) to speed up the reading of a set of long chains of root files, amounting to a few gigabites each.
We found out using our original code, with no caching, on one of our test machines, that the CPU consumption of our jobs was around 50%-60%, and we thought that maybe the I/O was too slow.

Using increasing values of cachesize (ranging from 10MB, 100MB and 1GB) we noticed an immediate speed improvement in our code (i.e. CPU consumption consistently around 100% on the same machine, running the same code with caching enabled on the same chain of files).
Unfortunately this effect seems to vanish as soon as the cache is filled up: after a certain amount of analyzed events, depending on the value of cachesize used, cpu consumption suddenly jumps back at 50% and the code slows down again.

Is there any way to force refreshing of the cache with the next non-analyzed events?
Could anyone point us to an example of usage of this feature on large files/chains?

Thanks a lot,
Gabriele

Hi Gabriele,

In addition to enable the cache, in v5.32 you can also enable a thread that will effectively prefetch the next cache chunk while processing is going on. To enable this asynchronous prefetcher use:gEnv->SetValue("TFile.AsyncPrefetching", 1);

Also note that for the TTreeCache to be most effective, the file have to be written with recent version of ROOT (where entry clustering has been enhanced to fit the TTreeCache).

In addition, you should make sure to read only the branches containing the data you use.

Cheers,
Philippe.

Dear Philippe,

thanks a lot for your reply.
Is there any other way to control the cache at runtime, for example using TTree->SetCacheSize(0) and then again TTree->SetCacheSize(cachesize) to force refresh? Or is TTree->SetCacheEntryRange(…) useful to control the cache while the program runs?
Unfortunately the version of ROOT distributed with the experiment software we are using now is 5.28.

Thanks a lot,
Gabriele

Hi,

[quote]Is there any other way to control the cache at runtime[/quote]Well that depends on the purpose! From your description is not clear where the bottleneck really is.

[quote] for example using TTree->SetCacheSize(0) and then again TTree->SetCacheSize(cachesize) to force refresh?[/quote]How and/or why would it help in your case?

[quote] Or is TTree->SetCacheEntryRange(…) useful to control the cache while the program runs?[/quote]It tell the cache that you are not interested in any entry before or after the range and the Cache will then not load any of those entries.

Also if you are reading only a sub-set of the entries, you ought to use a TEntryList to tell the TTree and thus the TTreeCache which of the entries you are planning to read and thus avoid the loading (if possible) of the data related to the entries not read.

[quote]Is there any way to force refreshing of the cache with the next non-analyzed events?[/quote]The TTreeCache read the next block of data as soon as one of the branch request an entry outside of the currently cached range (So somebranch->GetEntry(start_of_next_cluster) would technically do what you request).

Cheers,
Philippe.

Dear Philippe,

thank you very much for your reply.

I think there has been some misunderstanding from our side. We performed additional tests with and without the usage of TTree->SetCacheSize() in our sample loop, accessing the same root files on AFS.

We concluded that the speed improvements we had noticed were due to AFS caching the files analyzed in the previous run of our code rather than to our usage of the TTree->SetCacheSize() method.
So the strange behaviour in the CPU consumption pattern of our code had nothing to do with ROOT caching, actually it was caused by the filesystem we are using.

I’m sorry for this, thanks again for taking some time to look into this issue and provide hints.

Cheers,
Gabriele

Hi Gabriele,

Indeed AFS is most often an unfortunate choice performance-wise when serving data files (especially large).

Cheers,
Philippe.