Hello, I wonder how to make variables accumulable when using PROOF in parallel mode?
Below is an example when using traditional event loop in serial mode:
TFile* file = new TFile("test.root","read");
TTree* tree = (TTree*)file->Get("TestTree");
Int_t x;
tree->SetBranchAddress("x",&x);
Int_t nEntries = tree->GetEntries();
Int_t counts =0;
for (int entry = 0; entry < nEntries; entry++)
{
tree->GetEntry(entry);
if(x<2)
{
counts++;
}
}
printf("counts of entries with x<2 = %d\n",counts);
When using TSelector with PROOF in parallel mode, I don’t know how to do the counting. For example, if the PROOF is running in 4 threads, then there will be four sandbox, four Selectoe::Process(), etc. running simultaneously.
Is there a way to define an “accumulable” variable named Int_t counts, and then let each thread count, finally merge the four counts in Selector::Terminate()? (the idea is like auto counts = accumulableManager->CreateAccumulable<G4int>("counts") in Geant4)
Hi,
You can try using TParameter<Int_t>.
Define ncounts as member of TSelector and initialize it to 0. Increment it as needed in the processing.
In TSelector::SlaveTerminate
Thank you so much!! This example does exactly what I need.
But there is one thing I don’t understand. Why do we add the ncounts into fOutput in TSelector::SlaveTerminate instead of doing this in TSelector::SlaveBegin(), just like what we do for defining histograms?
I have heard RDataFrame many times but I did not find a dummy-oriented “QuickStart” for it, and since there are bunch of tutorials for the legacy stuff so I chose easier way.
Because if you do it in SlaveBegin you add an object created from an Int_t in its initial value, i.e. 0, which won’t be updated. Doing it in SlaveTerminate ensures that you do not get 0 in Terminate. The difference with an histogram is that the histogram is the object which is updated, while here TParameter is only a snapshot of what gets updated; better get the last one.
Up to you, but Proof is not maintained any longer, so you will forced to move at a certain point. A part from automatic and more efficient parallelisation, there are plenty of other features that Proof does not have, and the entry point documentation seems ok to me as a start.