Question about parallelization using ROOT on WSL2

Hello,

I was running a macro that reads multiple root files, and I was trying to do this process a little faster due to they are large, I was trying to use ROOT::EnableImplicitMT and ROOT::TSeqUL. I am not expert doing parallelization, but I was expecting when I define ROOT::EnableImplicitMT(n), the n-threads start to populate the CPU on my machine, but only 1 of 20 threads showed high usage independent of the n parameter I put, the others were idling, Did I miss something to be able to parallelize process on my macro? Is it possible to run a macro over the 20 threads in parallel?

Thanks.

ROOT Version: 6.22/00
Platform: WSL2

Welcome to the ROOT forum.

I am not sure if there is such a limitation on the maximum number of threads launched when you use EnableImplicitMT. Maybe @eguiraud will know.

Hi,

see the docs for what EnableImplicitMT does. Which of the listed features are you leveraging exactly?

Cheers,
Enrico

Hi,

The one I am using is TTree::GetEntry, see below a piece of the code I am running.

void vars()
{
	gErrorIgnoreLevel = kWarning;
	ROOT::EnableThreadSafety();
	ROOT::EnableImplicitMT(4);   ///////////////////////////////////////////////// EnableImplicitMT

	TMVA::Tools::Instance();
	TMVA::Reader *dataloader = new TMVA::Reader("!Color:Silent");

	/*code*/

    // Files to read
	map<string, string> ss = {{"cv_single", "/mnt/c/Jairo/kaon/v2/BDT_kaonPID/files_check/orig_files/60/detsys_single_cv.root"},
							  {"cv_asso", "/mnt/c/Jairo/kaon/v2/BDT_kaonPID/files_check/orig_files/60/detsys_asso_cv.root"},

							  {"lyatte_single", "/mnt/c/jairo/kaon/v2/BDT_kaonPID/files_check/orig_files/60/detsys_single_lyattenuation.root"},
							  {"lyatte_asso", "/mnt/c/jairo/kaon/v2/BDT_kaonPID/files_check/orig_files/60/detsys_asso_lyattenuation.root"},

							  {"lydown_single", "/mnt/c/jairo/kaon/v2/BDT_kaonPID/files_check/orig_files/60/detsys_single_lydown.root"},
							  {"lydown_asso", "/mnt/c/jairo/kaon/v2/BDT_kaonPID/files_check/orig_files/60/detsys_asso_lydown.root"},

							  {"lyrayleigh_single", "/mnt/c/jairo/kaon/v2/BDT_kaonPID/files_check/orig_files/60/detsys_single_lyrayleigh.root"},
							  {"lyrayleigh_asso", "/mnt/c/jairo/kaon/v2/BDT_kaonPID/files_check/orig_files/60/detsys_asso_lyrayleigh.root"},

							  {"wiremodthetaxz_single", "/mnt/c/jairo/kaon/v2/BDT_kaonPID/files_check/orig_files/60/detsys_single_wiremodthetaxz.root"},
							  {"wiremodthetaxz_asso", "/mnt/c/jairo/kaon/v2/BDT_kaonPID/files_check/orig_files/60/detsys_asso_wiremodthetaxz.root"},

							  {"wiremodthetayz_single", "/mnt/c/jairo/kaon/v2/BDT_kaonPID/files_check/orig_files/60/detsys_single_wiremodthetayz.root"},
							  {"wiremodthetayz_asso", "/mnt/c/jairo/kaon/v2/BDT_kaonPID/files_check/orig_files/60/detsys_asso_wiremodthetayz.root"},

							  {"wiremoddedx_single", "/mnt/c/jairo/kaon/v2/BDT_kaonPID/files_check/orig_files/60/detsys_single_wiremoddedx.root"},
							  {"wiremoddedx_asso", "/mnt/c/jairo/kaon/v2/BDT_kaonPID/files_check/orig_files/60/detsys_asso_wiremoddedx.root"},

							  {"wiremodx_single", "/mnt/c/jairo/kaon/v2/BDT_kaonPID/files_check/orig_files/60/detsys_single_wiremodx.root"},
							  {"wiremodx_asso", "/mnt/c/jairo/kaon/v2/BDT_kaonPID/files_check/orig_files/60/detsys_asso_wiremodx.root"},

							  {"wiremodyz_single", "/mnt/c/jairo/kaon/v2/BDT_kaonPID/files_check/orig_files/60/detsys_single_wiremodyz.root"},
							  {"wiremodyz_asso", "/mnt/c/jairo/kaon/v2/BDT_kaonPID/files_check/orig_files/60/detsys_asso_wiremodyz.root"},

							  {"recomb2_single", "/mnt/c/jairo/kaon/v2/BDT_kaonPID/files_check/orig_files/60/detsys_single_recomb2.root"},
							  {"recomb2_asso", "/mnt/c/jairo/kaon/v2/BDT_kaonPID/files_check/orig_files/60/detsys_asso_recomb2.root"},

							  {"sce_single", "/mnt/c/jairo/kaon/v2/BDT_kaonPID/files_check/orig_files/60/detsys_single_sce.root"},
							  {"sce_asso", "/mnt/c/jairo/kaon/v2/BDT_kaonPID/files_check/orig_files/60/detsys_asso_sce.root"}//,

	};


	/*code*/

    // def histos
    map<string, map<string, TH1F *>> bdth;
	map<string, map<string, TH1F *>> bdth2;
	map<string, map<string, TH1F *>> varsh;

    //create histos
    for (auto f : ss)
	{
		for (int i = 0; i < sz; i++)
		{
			bdth[f.first][hname[i]] = new TH1F(Form("%s_%s", f.first.c_str(), hname[i].c_str()), Form("%s", hname[i].c_str()), nbins, bins[i][0], bins[i][1]);
			bdth2[f.first][hname[i]] = new TH1F(Form("%s_%s_2", f.first.c_str(), hname[i].c_str()), Form("%s_2", hname[i].c_str()), nbins, 0, 0);
		}

		for (int i = 0; i < szh; i++)
		{
			varsh[f.first][hnamev[i]] = new TH1F(Form("%s_%s", f.first.c_str(), hnamev[i].c_str()), Form("%s", titlev[hnamev[i]].c_str()), nbins, binsv[i][0], binsv[i][1]);

		}
	}


	for (auto f : ss)  ///////////////////// Reading over all files
	{

			float totpot1 = 0;
			float pot1 = -9999;

			TChain *t[] = {NULL, NULL};

			t[0] = new TChain("CCKaonAnalyzer/subruns");   // branch 1
			t[0]->Add(Form("%s", f.second.c_str()));
			Long64_t nentries1 = t[0]->GetEntries();

			t[0]->SetBranchAddress("pot", &pot1);

			for (auto m = 0; m < nentries1; m++)
			{
				t[0]->GetEntry(m);
				totpot1 += pot1;
			}
			cout << "POT " << f.first << ":" << totpot1 << endl;

			t[1] = new TChain("CCKaonAnalyzer/Event");   // branch 2
			t[1]->Add(Form("%s", f.second.c_str()));
			t[1]->SetImplicitMT(true);
			Long64_t nentries = t[1]->GetEntries();
			weightedplotsa event(t[1]);

			for (auto ievt : ROOT::TSeqUL(nentries))
			{

				event.GetEntry(ievt);  /////////////////////////////////////////////// GetEntry

				if (f.first == "bnb" && event.true_nkaons == 1)
					continue;

				int ct = 0;

				for (int trk = 0; trk < event.reco_ntracks; trk++)
				{

					/*Reading BDT vars*/

					BDT = dataloader->EvaluateMVA("BDT");

					// Vars to plot #####################################################################################################################

					vars[0] = event.reco_track_distance[trk];
					vars[1] = event.reco_track_nhits0[trk];
					vars[2] = event.reco_track_nhits1[trk];
					vars[3] = event.reco_track_nhits2[trk];
					vars[4] = event.reco_track_kin0[trk];
					vars[5] = event.reco_track_kin1[trk];
					vars[6] = event.reco_track_kin2[trk];
					vars[7] = event.reco_track_length[trk];
					vars[8] = event.reco_track_theta[trk];
					vars[9] = event.reco_track_phi[trk];
					vars[10] = KEcalculator(event.reco_track_length[trk]);

					// ################################################################################################################################

					if (BDT > 0.41)
					{
                        // Filling histos

						for (int i = 0; i < sz; i++)
						{
							bdth[f.first][hname[i]]->Fill(bdtvars[i], scales[f.first]);
							bdth2[f.first][hname[i]]->Fill(bdtvars[i], scales[f.first]);
						}

						for (int i = 0; i < szh; i++)
						{
							varsh[f.first][hnamev[i]]->Fill(vars[i], scales[f.first]);
						}
					}

				} // Tracks

			} // Events

			delete *t;
		//}

	} // files

	varsys.clear();
	varsh.clear();
	bdth.clear();

}

Thanks.

I see, could it be that most of the time is spent outside of GetEntry, e.g. in the BDT inference step? I guess you could measure that directly. If most of the time is spent in GetEntry but GetEntry does not use more than one thread then we can take a look at what’s going on, in principle each branch in tree event should be read and deserialized in parallel.

However, the best thing you can do for parallelization is to do the whole processing, from reading up to filling the histograms, in parallel for different chunks of the dataset. RDataFrame can help with that if you are willing to learn a new interface. Otherwise TTreeProcessorMT can help as well (with TTreeProcessorMT you would need to fill different copies of the histograms from different threads and merge at the end. RDataFrame is more high-level and takes care of that for you).

Cheers,
Enrico

Thanks for your answer, I am going to check which part of the code is the most time-consuming. Also, I am going to give a try to RDataFrame, looks like it is more suitable for the things I am doing.

Regards.

Hi,

Is there a reason you don’t just run 20 processes in parallel and then merging the histograms with hadd? There might be, e.g. memory requirements etc. but I find that often the simple solution is to just execute several programs in parallel

Cheers

Joa

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.