TSelector, TTreePlayer and analysis progress

kind · February 25, 2009, 1:53pm

Hello ROOTers,

in my analysis which is based upon TSelector I’d like to retrieve the number of already processed events and the total number of events to be processed, in order to compute the number events left and a time estimate for the job. Unfortunately I cannot access these numbers since they are declared as local variables inside the TTreePlayer::Process() member function (nentries, firstentry, entry). One possible soultion would be to declare the numbers as data members of the TTreePlayer class which could then be obtained by some getter function. What do you think?

Best regards,
Oliver

brun · February 27, 2009, 10:00pm

In your TSelector you have the pointer fChain. You can use TChain::GetReadEntry to find out the current entry and all other service functions returning the number of entries, number of processed events,etc

Rene

kind · March 3, 2009, 2:55pm

Hello René,

thanks for your reply. TChain::GetReadEntry() does the trick for the current event. However, I do not see how I could access the start entry number and the maximum entry number or the number of entries to be processed, resp. These variables are only local variables inside TTreePlayer::Process(). as far as I can see they are also not transferred to some data members of the corresponding chain. Maybe I missed something?

Oliver

pcanal · March 4, 2009, 5:23pm

Hi,

[quote] the number of entries to be processed[/quote]is actually not even known per se in TTreePlayer::Process, the default is for the value in ‘nentries’ to be a very large number (i.e. entry<firstentry+nentries is always true) and the loop to be stopped by “if (localEntry < 0) break;”. This is necessary because this function is used also for TChains where calculated the total number of entries is an expensive operation (requires opening of ALL the input files).

You can ‘cache’ the value of the first entry is your own Process function (by capturing the value of the argument the first time it is called).

Also the total number of entries that will be process is dependent on whether or not there is a TEventList.

Cheers,
Philippe.

kind · March 5, 2009, 10:07am

Hi Philippe,

Right. I understand that the determination of the events to be processed is not a simple task and has several dependencies. This is why I suggest to store those numbers in some data members, as soon as they are calculated in TTreePlayer::Process(). This would give the user the possibility to access them inside the event loop without the need for a re-calculation or even a full scan of the chain. At the moment I don’t see how I could possibly obtain these numbers for all cases (tree, chain, event list, user-defined limit).

Cheers,
Oliver

pcanal · March 5, 2009, 5:11pm

Hi Olivier,

The real issue is that the number of entries to be processed is known to TTree::Process (in the general case) ONLY at the end! [quote]At the moment I don’t see how I could possibly obtain these numbers for all cases (tree, chain, event list, user-defined limit). [/quote]Neither does TTree::Process, it knows it only at the end … which does not really help you.

In addition, TSelector are also used in the context of Proof and we strongly encourage developing TSelector code that can work the same with or without Proof. In the context of Proof the meaning of ‘first entry’ and ‘nentry’ is even more murky (would the code want to see the ‘global number’ or the numbers just for this slave or the number for just this ‘bunch’ of entries for this slave).

Nonetheless, we agree that it would be good to have a way for your code to get way to know/monitor the progress of the analysis. In the proof context we already have some similar tools and we are investigating a way to unify the proof and non-proof case.

Cheers,
Philippe.

kind · March 6, 2009, 8:07pm

Hi Phillipe,

This I do not understand.

The crucial line in TTreePlayer::Process() is

for (entry=firstentry;entry<firstentry+nentries;entry++) {

which starts the event loop. Here, nentries was computed already by

nentries = GetEntriesToProcess(firstentry, nentries);

which takes a possibly exisiting event list into account. So everything is perfectly known when entering the event loop. One could think of about storing the numbers at this point.

I see that the situation for PROOF is different and more complex. Of course, it would be nice to have a system working for both, PROOF and non-PROOF jobs in parallel. The possibiiity to use PROOF was one of the reasons why I have chosen to inherit my analysis classes from TSelector.

Many thanks for your help and understanding.

Cheers,
Oliver

pcanal · March 6, 2009, 8:17pm

[quote]which takes a possibly exisiting event list into account. So everything is perfectly known when entering the event loop. One could think of about storing the numbers at this point[/quote]This is true only if the underlying object is not a TChain (or the the TChain has already been scanned once, or if the TChain itself as an entry list). In the case of a TChain nentries will be equal to a very large number.

Cheers,
Philippe.