Finding number of processed events

krasznaa · December 3, 2008, 5:43pm

Hi,

Okay, last question for today…

How do I find out how many events (TTree entries) my job processed? Usually I don’t specify the maximum number of events to be processed, just let PROOF process every event. I see that PROOF even finds this number before starting the analysis itself, in the step when it checks the availability of the input files.

Searching for this on the web I found some old PPT presentations that showed printouts saying stuff like: “worker 0 processed X events; worker 1 processed Y events; …” I’d like to get such information in my program as well, but couldn’t find such features in any of the current PROOF classes.

It would actually be even better if I could know the number of available events inside the TSelector code. If I only want to run on a subset of one of my datasets, but would still want to weigh it correctly in the output histograms, I have to know what fraction of the whole sample is being processed.

Is any of this possible to do right now?

Cheers,
Attila

ganis · December 4, 2008, 9:40am

Hi Attila,

After processing, the total number of events processed is given by:

p->GetQueryResult()->GetEntries()

The entries per worker are part of the performance info which is not saved by default. To enable it you have to set the gEnv variable Proof.StatsHist to 1 and then you will get an histogram named “PROOF_EventsHist” with the info:

root [0] p = TProof::Open("")
 +++ Starting PROOF-Lite with 4 workers +++
Opening connections to workers: OK (4 workers)
Setting up worker servers: OK (4 workers)
PROOF set to parallel mode (4 workers)
(class TProof*)0xa69a80
root [1] gEnv->SetValue("Proof.StatsHist",1);
root [2] .x preph1.C   // create 'dh1'
root [3] p->Process(dh1, "h1analysis.C+")
Info in <TProofLite::SetQueryRunning>: starting query: 1
Info in <h1analysis::Begin>: Starting h1analysis with process option:
Looking up for exact location of files: OK (4 files)
Validating files: OK (4 files)
(Long64_t)0
root [4] hevts = (TH1D *) p->GetOutputList()->FindObject("PROOF_EventsHist")
(class TH1D*)0x10618f0
root [5] hevts->Draw()
<TCanvas::MakeDefCanvas>: created default TCanvas with name c1
root [6]

Setting the parameter

p->SetParameter("PROOF_StatsHist", "")

(value is ignored) also enables filling the detailed stats info.

Cheers, Gerri

krasznaa · December 4, 2008, 5:19pm

Hi Gerri,

The PROOF histograms seem to only work for me if I use your TEnv setting. Setting the parameter that you wrote, didn’t make PROOF generate the histograms for me. But it works in one way at least, which is enough.

However the event number from GetQueryResult() is somehow not behaving well for me. As I had to learn this could very well be a mistake on my side, but I couldn’t spot a typo in my code so far.

The observation is that after running over about 15k events in two queries, and adding the number of processed events with this function after each query to a Long64_t type variable, my job reports that it processed 3 events. If I just run the 2nd query (about 12k events) then the job reports 2 events processed.

The histograms seem to have the correct results in both cases, even though I can’t decide from a few of them if they are correct or not. But the number of processed events per node histogram sure makes sense.

Any idea what could be going wrong this time?

Cheers,
Attila