I haven’t found consistent answer or me so I will ask the question again. For our analysis we make intermediate ntuples (flat TTree). Adding systematics can give ~2000 branches in a tree. We moved to this model from having one tree per systematic as this reduced the storage needed significantly.
But now I observe quite slow looping over the trees. Note that we merge ntuples to relatively large files (up to 90 GB) so that the amount of events per file is more consistent.
The questions are:
Does the number of events in a tree matter? We can keep the individual file size smaller.
How does the number of branches affect the performance? We do not load all branches at once when processing the file (only ~30 branches).
Should we use non-default settings with handling such big amount of branches when creating the tree?
Thanks in advance for all answers (and sorry if some have already been answered before).
Hi,
I’m not the expert but maybe I can provide some insight while we wait for the authoritative replies.
If you use TTree directly, you might want to SetBranchStatus("*", 0) and then SetBranchStatus("...", 1) for only the branches you need to read.
The number of events in a TTree only matters insofar as the time to process a TTree increases roughly linearly with the number of events, as one would expect.
The number of branches actually read affects performance (TTree might read branches that you don’t need if their BranchStatus is 1), the number of total branches in a TTree should not matter much afaik.
What I have tested now is to make files smaller and I think they do run faster. So I guess having a large file and then seeking to some event N takes longer if the file is larger.
seeking to some event N takes longer if the file is larger.
Yes, that’s the case: TTrees are not optimized for random access, but rather for sequential reading.
EDIT: in other words, if you go from event N to event N+1, that should be fast, no matter how large N is. if you skip a large amount of events, or perform random access, that will typically be slower.
If you use SetBranchStatus and TTree::GetEntry, the GetEntry function has to loop over all the branches to vet the one that are enabled or not. With larger number of branches, you are better off using LoadTree and TBranch::GetEntry. [Note that if you are indeed accessing the entry in random order, the effect I describe will be minor compare to the code of reading and decompressing some/most of the data multiple time].
No, if you only have one TTree you can skip the call to LoadTree. (On the other hand, adding it (even if you call it on a TTree) will make your code ‘TChain’ ready … albeit reading out-of-order in a TChain would be even worse )
Thank you all for your suggestions. I could improve the performance a lot.
But now the main problem is that I chose an unfortunate ordering of the branches
The average read transaction is 26.590588 Kbytes. Is there a way to force larger transactions in TTreeCache without remaking the tree?
The cache size after the first GetEntry call is 6606867.
This is the final report after 100k events (with all the default settings):
Number of branches in the cache ...: 91
Cache Efficiency ..................: 0.963868
Cache Efficiency Rel...............: 1.000000
Secondary Efficiency ..............: 0.000000
Secondary Efficiency Rel ..........: 0.000000
Learn entries......................: 100
Cached Reading.....................: 96357395 bytes in 3393 transactions
Reading............................: 0 bytes in 0 uncached transactions
Readahead..........................: 256000 bytes with overhead = 472156423 bytes
Average transaction................: 28.398879 Kbytes
Number of blocks in current cache..: 3312, total size: 6577808
And this are the TFile statistics: Read 0.151589 GB in 3398 transactions.
Something weird happened now. I processed the same file via xrootd (from EOS) and the result looks much more reasonable:
Number of branches in the cache ...: 91
Cache Efficiency ..................: 0.924730
Cache Efficiency Rel...............: 1.000000
Secondary Efficiency ..............: 0.000000
Secondary Efficiency Rel ..........: 0.000000
Learn entries......................: 100
Cached Reading.....................: 100882823 bytes in 16 transactions
Reading............................: 0 bytes in 0 uncached transactions
Readahead..........................: 256000 bytes with overhead = 0 bytes
Average transaction................: 6305.176438 Kbytes
Number of blocks in current cache..: 3220, total size: 6424412
[00:56:55] Read 0.163272 GB in 21 transactions.
If I run it locally (ceph filesystem), I still get a lot of transactions. I think it might be a bit hard to reproduce then. But why should the underlying filesystem matter?
In any case the file is here and shared with you: /eos/user/t/tadej/shared/MultiLeptonAnalysis/test/v1/diboson_tree.root
The branches I read are here: branches.txt (2.3 KB) (not sure why less than 91)
I will try to prepare a minimal example to reproduce tomorrow (our framework is quite big).
Outline of the code:
Get the branches and put them in a vector (tree->GetBranch(name.c_str()))
The local file read (i.e. posix) does not have a “vector read instruction” (i.e. one system call to read multiple non-consecutive area of the file), so the sparse read (likely the case if you read only a few branches) are issue one actually read/transaction per basket. There is an effort made to coalesce the read that are ‘close by’ and is control by the setting called in the prinout ’Readahead and indeed it activated quite a bit in your case.