Performance - understanding gVersionCheck

Hi,

I am currently rewriting some code in our analysis to make it a bit safer and tested.
One of the issues which is important in this change is to make it faster or same.
Therefore I am using the Google PerfTools to analyse the code.

The basic mechanics of the analysis involve reading NTuples from ROOT files, creating objects and perform some operations on them.

One example of a call graph can be found here:
/afs/cern.ch/user/k/kreczko/public/profilingTest.pdf

Since gVersionCheck takes ~85% of the CPU time, I am a bit curious what it does.
I would expect inflate() (or _init()) to be the dominant function since it initialises the TTree.

Of course this graph corresponds to only a few samples, but I am too inexperienced with the tools for now
to make it better.

Hi,

There is a problem with the way your tools select symbols :slight_smile: as this result is impossible.

gVersionCheck is actually an object name:static TVersionCheck gVersionCheck(ROOT_VERSION_CODE);The class TVersionCheck just does a simple comparison of the ROOT_VERSION_CODE of a library vs the one on libCore. This should result in one execution/comparison per library loaded. This is definitively not called by any of the routines mentioned in your call calls (well except maybe the unnamed ones which are likely to be library initialization routines). The routines pointing to ‘gVersionCheck’ in your graphs do not have much in common and thus I suspect that it actually correspond to several actual routines.

Cheers,
Philippe.

Ps. One simple optimization to be made in your case is to replace the use of TChain::GetEntries with TChain::GetEntriesFast (and add a check of the return value of LoadTree inside the routine) as GetEntries forces the opening of all the data files before even starting to process the data.

[quote=“pcanal”]Hi,

There is a problem with the way your tools select symbols :slight_smile: as this result is impossible.
[/quote]

This is exactly what I was thinking.

[quote=“pcanal”]gVersionCheck is actually an object name:static TVersionCheck gVersionCheck(ROOT_VERSION_CODE);The class TVersionCheck just does a simple comparison of the ROOT_VERSION_CODE of a library vs the one on libCore. This should result in one execution/comparison per library loaded. This is definitively not called by any of the routines mentioned in your call calls (well except maybe the unnamed ones which are likely to be library initialization routines). The routines pointing to ‘gVersionCheck’ in your graphs do not have much in common and thus I suspect that it actually correspond to several actual routines.

Cheers,
Philippe.
[/quote]

This is as far as I understood it in the meanwhile. I guessed that the only way it can come up with so much CPU time use is if every object read from the Ntuple file would be checked. But since these objects are either plain numbers (int or float) or std::vectors of int/float I couldn’t see a reason for the ckeck.

[quote=“pcanal”]
Ps. One simple optimization to be made in your case is to replace the use of TChain::GetEntries with TChain::GetEntriesFast (and add a check of the return value of LoadTree inside the routine) as GetEntries forces the opening of all the data files before even starting to process the data.[/quote]

This is an interesting tip. I stumbled across these two functions and found this in the code (TTree):

Long64_t GetEntries() const { return fEntries; } Long64_t GetEntriesFast() const { return fEntries; }

So I didn’t expect to see a difference. However, as I saw today the TChain has a different implementation
of GetEntries.
I will try it and see if it makes a difference. I would expect it to be the same if I analyse the whole set of files and faster if I don’t.

But what does LoadTree do or why do i need it?

i’ve looked it up and realised one thing:
In the old analysis code we do:

Long64_t flag = chain->LoadTree(entry); if (flag < 0) break; chain->GetEntry(entry);

So I guess I should change it to use GetEntry only and test its return value against 0, since GetEntry calls LoadTree.

Thanks for the reply.

Cheers,
Luke

ok, I changed

to

The code runs now 50% faster, but instead of 4mio events I get now 1234567890.
I guess this is the initial value for Long64_t, otherwise it makes no sense.

What do I have to do if I want the real number of events stored in a TChain?

EDIT:
found the place where the number comes from:

[quote]The code runs now 50% faster, but instead of 4mio events I get now 1234567890. [/quote]This is the expected behavior. It is so that you can use the code pattern:for(Long64_t l = 0; l < chain->GetEntriesFast(); ++l) { Long64_t localentry = chain->LoadTree(entry); if (localentry < 0) break;

[quote]What do I have to do if I want the real number of events stored in a TChain?[/quote]You need to call chain->GetEntries() but it will cost a lost unless you call it after the first full loop through the chain (once the very loop through the chain is done, the total number of entries is stored in memory).

[quote]So I guess I should change it to use GetEntry only and test its return value against 0, since GetEntry calls LoadTree.[/quote]No as you could/should replace the call from chain->GetEntry by calls to branch->GetEntry(localentry) so to avoid reading and streaming data you do not need (this will result in yet another important performance gain).

Cheers,
Philippe.

[quote=“pcanal”][quote]The code runs now 50% faster, but instead of 4mio events I get now 1234567890. [/quote]This is the expected behavior. It is so that you can use the code pattern:for(Long64_t l = 0; l < chain->GetEntriesFast(); ++l) { Long64_t localentry = chain->LoadTree(entry); if (localentry < 0) break;
[/quote]
OK, I think I understand. This construct is basically a while loop, since GetEntriesFast() returns just a big number. The break condition occurs once it reaches an entry which exceeds the total number of entries.

Since we do both in the old code I didn’t understand the implications. Thanks!

I will then use an internal counter for this.

[quote=“pcanal”]

[quote]So I guess I should change it to use GetEntry only and test its return value against 0, since GetEntry calls LoadTree.[/quote]No as you could/should replace the call from chain->GetEntry by calls to branch->GetEntry(localentry) so to avoid reading and streaming data you do not need (this will result in yet another important performance gain).

Cheers,
Philippe.[/quote]

I don’t see this happen. Since I disable all branches but the ones I use, GetEntry, with getall = 0, should give me the same result:

TChain::GetEntry(Long64_t entry, Int_t getall) // -- Get entry from the file to memory. // // getall = 0 : get only active branches // getall = 1 : get all branches

[quote]I don’t see this happen. Since I disable all branches but the ones I use, GetEntry, with getall = 0, should give me the same result:[/quote]Yes, this also work (but I am not a fan of SetBranchStatus :slight_smile: ).

Cheers,
Philippe.