About Multithreading for reading and analyse root file

Siddharth_Parashari · April 8, 2021, 12:38pm

Hi,
I am reading a branch from a root file. Then using the branch to create a histogram in a loop and doing some calculations. So every time I need to make a histogram the first loop is for a variable and another loop over entries to fill the histogram. So it is taking too much time over my system and using only one CPU. My PC has 24 CPUs and I want to utilize all of them. I have tried implicit MT using ROOT::EnableImplicitMT(nThreads); but no benefit as it is then taking more time to perform. So is there a way so that I can distribute my loop over multiple threads. The loop looks like this…

TFile *input = new TFile(Form("%s",fname), "READ");
TTree *data = (TTree*) input->Get("data");
data->SetBranchAddress("q", &q);
int entries = data->GetEntries();

for(CHID=1;CHID<384;CHID++)
{
       TH1F* h1 = new TH1F("h1", "Spectrum of Gamma Rays", nbins, 0, 100);
        if(CHID >64 && CHID < 320){continue;}
        cout<<"Performing fit for Channel ID ="<<CHID<<"\n"; 		
		for(j=1;j<=entries1;j++)
		{
                 data->GetEntry(j);
		 if(q[CHID]>0)
		        {
                         h1->Fill(q[CHID]);
			}
			else{continue;}
                  }	
		h1->Draw("L")
}

ROOT Version: 6.22/08

eguiraud · April 13, 2021, 8:29am

Hi @Siddharth_Parashari ,
and welcome to the ROOT forum.

ROOT has several facilities for multi-thread analysis/histogram filling, see e.g. TTreeProcessorMT.

For your case, however, RDataFrame is probably the right/easiest interface. To fill 2 histograms in the same multi-thread event loop:

ROOT::EnableImplicitMT();
ROOT::RDataFrame df("data", fname);
std::vector<ROOT::RDF::RResultPtr<TH1D>> histos;
for(auto CHID=1;CHID<384;CHID++) {
  histos.emplace_back(df.Define("qchid", "q[CHID]").Filter("qchid > 0").Histo1D("qchid"));
}

You could also think of filling a single TH2D instead of N TH1Ds for better performance.
Cheers,
Enrico

Siddharth_Parashari · April 13, 2021, 12:28pm

Hi, Thank you for your suggestion. I have tried the changes but it is showing error like
“use of undeclared identifier CHID”.

I have already declared CHID so don’t know why this error is coming.

eguiraud · April 13, 2021, 12:46pm

yes there is a typo in my/your code, it should be for(int CHID = 1; CHID < 384; CHID++). But make sure to read RDataFrame’s docs that I linked, check out the tutorials, and understand what’s going on.

Siddharth_Parashari · April 13, 2021, 12:48pm

hi,
Sure I’ll look into the docs but can you please give me a quick guide how to draw these histos for analysis.

eguiraud · April 13, 2021, 12:50pm

What’s missing from the example code above (now that the typo is fixed)?

Siddharth_Parashari · April 13, 2021, 12:54pm

In line
histos.emplace_back(df.Define(“qchid”, “q[CHID]”).Filter(“qchid > 0”).Histo1D(“qchid”));
I have changes “q[CHID]” to only “q” so now it not showing any error message but I don’t know if it running correctly. I am attaching my old code below.

calibrate1.C (6.0 KB)

eguiraud · April 13, 2021, 1:04pm

Ah yes, df.Define("qchid", "q[CHID]") should be df.Define("qchid", [CHID] (const ROOT::RVec<float> &q) { return q[CHID]; }, {"q"}) (we need to capture CHID from the outer scope).

eguiraud · April 13, 2021, 1:15pm

As an aside, note that a way to speed up your original code by a lot, even if it still runs on a single thread, is to move the loop over entries outside of all others, so you loop over data once and fill all histograms in that single loop.

Your current code loops over the dataset 128 times or so.

Siddharth_Parashari · April 13, 2021, 2:57pm

Yes, I tried that also but my input root file is around 1 GB so anyways it is taking time for execution. I have also tried to write a separate script for multithreading which I am attaching. If you have time then please give your suggestions.

Thanks a lot for the help.

calibrate_channels.C (9.7 KB)

eguiraud · April 13, 2021, 3:31pm

In

    for(j=1;j<=entries;j++)
    {
        for(auto CHID=N[1]; CHID<=N[2]; CHID++)·
        {   data->GetEntry(j);
            if(q[CHID]>0)
            {
                h[CHID]->Fill(q[CHID]);
            }
            else{continue;}
        }
    }

you are still calling GetEntry a humongous amount of times. You need to take it out of the inner loop.

Siddharth_Parashari · April 13, 2021, 6:01pm

Yeah now It’s fine I guess. I will make the necessary changes.
Thank you for your time.

Siddharth_Parashari · April 13, 2021, 6:45pm

After using GetEntry() in the event loop only as suggested, my old script is considerably faster and I think I don’t need to use multi threading for this task.

Thank you so much for pointing out this simple yet very important mistake.

system · April 27, 2021, 6:45pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.