About Multithreading for reading and analyse root file

Hi,
I am reading a branch from a root file. Then using the branch to create a histogram in a loop and doing some calculations. So every time I need to make a histogram the first loop is for a variable and another loop over entries to fill the histogram. So it is taking too much time over my system and using only one CPU. My PC has 24 CPUs and I want to utilize all of them. I have tried implicit MT using ROOT::EnableImplicitMT(nThreads); but no benefit as it is then taking more time to perform. So is there a way so that I can distribute my loop over multiple threads. The loop looks like this…

TFile *input = new TFile(Form("%s",fname), "READ");
TTree *data = (TTree*) input->Get("data");
data->SetBranchAddress("q", &q);
int entries = data->GetEntries();

for(CHID=1;CHID<384;CHID++)
{
       TH1F* h1 = new TH1F("h1", "Spectrum of Gamma Rays", nbins, 0, 100);
        if(CHID >64 && CHID < 320){continue;}
        cout<<"Performing fit for Channel ID ="<<CHID<<"\n"; 		
		for(j=1;j<=entries1;j++)
		{
                 data->GetEntry(j);
		 if(q[CHID]>0)
		        {
                         h1->Fill(q[CHID]);
			}
			else{continue;}
                  }	
		h1->Draw("L")
}

ROOT Version: 6.22/08

Hi @Siddharth_Parashari ,
and welcome to the ROOT forum.

ROOT has several facilities for multi-thread analysis/histogram filling, see e.g. TTreeProcessorMT.

For your case, however, RDataFrame is probably the right/easiest interface. To fill 2 histograms in the same multi-thread event loop:

ROOT::EnableImplicitMT();
ROOT::RDataFrame df("data", fname);
std::vector<ROOT::RDF::RResultPtr<TH1D>> histos;
for(auto CHID=1;CHID<384;CHID++) {
  histos.emplace_back(df.Define("qchid", "q[CHID]").Filter("qchid > 0").Histo1D("qchid"));
}

You could also think of filling a single TH2D instead of N TH1Ds for better performance.
Cheers,
Enrico

Hi, Thank you for your suggestion. I have tried the changes but it is showing error like
“use of undeclared identifier CHID”.

I have already declared CHID so don’t know why this error is coming.

yes there is a typo in my/your code, it should be for(int CHID = 1; CHID < 384; CHID++). But make sure to read RDataFrame’s docs that I linked, check out the tutorials, and understand what’s going on.

hi,
Sure I’ll look into the docs but can you please give me a quick guide how to draw these histos for analysis.

What’s missing from the example code above (now that the typo is fixed)?

In line
histos.emplace_back(df.Define(“qchid”, “q[CHID]”).Filter(“qchid > 0”).Histo1D(“qchid”));
I have changes “q[CHID]” to only “q” so now it not showing any error message but I don’t know if it running correctly. I am attaching my old code below.

calibrate1.C (6.0 KB)

Ah yes, df.Define("qchid", "q[CHID]") should be df.Define("qchid", [CHID] (const ROOT::RVec<float> &q) { return q[CHID]; }, {"q"}) (we need to capture CHID from the outer scope).

As an aside, note that a way to speed up your original code by a lot, even if it still runs on a single thread, is to move the loop over entries outside of all others, so you loop over data once and fill all histograms in that single loop.

Your current code loops over the dataset 128 times or so.

Yes, I tried that also but my input root file is around 1 GB so anyways it is taking time for execution. I have also tried to write a separate script for multithreading which I am attaching. If you have time then please give your suggestions.

Thanks a lot for the help.

calibrate_channels.C (9.7 KB)

In

    for(j=1;j<=entries;j++)
    {
        for(auto CHID=N[1]; CHID<=N[2]; CHID++)·
        {   data->GetEntry(j);
            if(q[CHID]>0)
            {
                h[CHID]->Fill(q[CHID]);
            }
            else{continue;}
        }
    }

you are still calling GetEntry a humongous amount of times. You need to take it out of the inner loop.

Yeah now It’s fine I guess. I will make the necessary changes.
Thank you for your time.

:slight_smile:

After using GetEntry() in the event loop only as suggested, my old script is considerably faster and I think I don’t need to use multi threading for this task.

Thank you so much for pointing out this simple yet very important mistake.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.