How to make accumulable variable in PROOF?

luozf14 · February 24, 2023, 7:43pm

ROOT Version: 6.26.04
Platform: Ubuntu 20.04
Compiler: g++ 9.4.0

Hello, I wonder how to make variables accumulable when using PROOF in parallel mode?

Below is an example when using traditional event loop in serial mode:

TFile* file = new TFile("test.root","read");
TTree* tree = (TTree*)file->Get("TestTree");
Int_t x;
tree->SetBranchAddress("x",&x);

Int_t nEntries = tree->GetEntries();
Int_t counts =0;
for (int entry = 0; entry < nEntries; entry++)
{
    tree->GetEntry(entry);
    if(x<2)
    {
        counts++;
    }
}
printf("counts of entries with x<2 = %d\n",counts);

When using TSelector with PROOF in parallel mode, I don’t know how to do the counting. For example, if the PROOF is running in 4 threads, then there will be four sandbox, four Selectoe::Process(), etc. running simultaneously.

Is there a way to define an “accumulable” variable named Int_t counts, and then let each thread count, finally merge the four counts in Selector::Terminate()? (the idea is like auto counts = accumulableManager->CreateAccumulable<G4int>("counts") in Geant4)

Thanks!

Wile_E_Coyote · February 24, 2023, 8:20pm

Maybe @ganis can recommend something.

ganis · February 24, 2023, 9:01pm

Hi,
You can try using TParameter<Int_t>.
Define ncounts as member of TSelector and initialize it to 0. Increment it as needed in the processing.
In TSelector::SlaveTerminate

    fOutput->Add(new TParameter<Int_t>("MyCounts", ncounts))

In TSelector::Terminate

TParameter<Int_t> *ncp = 0;
if ((ncp = dynamic_cast<TParameter<Int_t> *>(fOutput->FindObject("MyCounts")))) {
      ncounts = ncp->GetVal();
}

(NB: code snippets not tried).

G Ganis

PS: Proof is in legacy mode. Did you consider trying RDataFrame?

luozf14 · February 24, 2023, 9:37pm

Hi Ganis,

Thank you so much!! This example does exactly what I need.

But there is one thing I don’t understand. Why do we add the ncounts into fOutput in TSelector::SlaveTerminate instead of doing this in TSelector::SlaveBegin(), just like what we do for defining histograms?

I have heard RDataFrame many times but I did not find a dummy-oriented “QuickStart” for it, and since there are bunch of tutorials for the legacy stuff so I chose easier way.

ganis · February 25, 2023, 2:45pm

Because if you do it in SlaveBegin you add an object created from an Int_t in its initial value, i.e. 0, which won’t be updated. Doing it in SlaveTerminate ensures that you do not get 0 in Terminate. The difference with an histogram is that the histogram is the object which is updated, while here TParameter is only a snapshot of what gets updated; better get the last one.

Up to you, but Proof is not maintained any longer, so you will forced to move at a certain point. A part from automatic and more efficient parallelisation, there are plenty of other features that Proof does not have, and the entry point documentation seems ok to me as a start.

G Ganis

system · March 11, 2023, 2:45pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.