TF1::GetRandom() slows down a multi-threaded (OpenMP) code

yu.karpenko · July 2, 2020, 2:14pm

ROOT Version: 6.18/04
Platform: Ubuntu linux
Compiler: GCC 9.2.1

Hi everyone,

I was updating my code to run it multi-threaded way using OpenMP, and noticed that in a scenario with more than 1 thread, instead of running faster the code slows down a lot.
I ended up producing a close-to-minimal working example, which is attached. Note that it is a standalone C++ program and not a ROOT macro.
The code essentially samples random numbers in a loop (from 0 to Ncasc-1), via TF1::GetRandom(), with the TF1 object (TF1* flt [nthreads] ) created based on ff_lt function.
To make the code parallel, I create an array of TF1 objects, one object per thread.

double ff_lt(double* x, double* par)
{
double lt2 = x[0]x[0]; // l_t^2
double& mg = par[0]; // m_g,min
double& mu = par[1]; // \mu
return x[0] * log(1. + lt2/(exp(1.0) * mgmg)) / pow(lt2 + mu*mu, 2);
}

int main(int argc, char **argv)
{
int nthreads = atoi(argv[1]);
std::cout << “running with " << nthreads << " threads\n”;
int rseed = 438468301;
double mg = 0.3, mu = 0.4;
const int Ncasc = 10000;
omp_set_num_threads(nthreads);
ROOT::EnableThreadSafety();
// initializing the TF1 objects for all threads
TF1 * flt [nthreads];
for(int i=0; i<nthreads; i++) {
char flt_ROOT_name [20];
sprintf(flt_ROOT_name, “dsigmadlt_%i”, i);
flt[i] = new TF1(flt_ROOT_name, ff_lt, 0., 10.0, 2);
}
std::cout << “init done\n”;
// parallel loop
#pragma omp parallel for schedule(static, 1000)
for(int icasc=0; icasc<Ncasc; icasc++) {
int thread_id = omp_get_thread_num();
flt[thread_id]->SetParameters(mg, mu);
double lt = flt[thread_id]->GetRandom();
} // end parallel loop
return 0;
}

on Core i5-8xxx desktop/laptop, such code with nthreads==1 runs for 6.7 sec, whereas with nthreads==4 (4 threads) the execution time is 22.8 sec.
So, parallelizing the main loop into 4 threads makes the code almost 4 times slower, instead of 4 times faster!

To my understanding supported by a quick debugging with gdb, what happens when the loop is parallelized using OpenMP is the following:
all the created TF1 objects flt[thread_id] call the same shared random number generator (presumably gRandom ?). Presumably, gRandom can’t be used in parallel, which is ensured after the call to ROOT::EnableThreadSafety().
So in the main loop, each thread spends most of its time waiting when the gRandom is unlocked (unused by other threads) to call gRandom->Rndm()

Perhaps it works like that by design. The question is: can one make each of the TF1 objects to use its own instance of TRandom3 class to do the random sampling completely separately?
I couldn’t find anything like SetRandom() in TF1 class.

There seems to be a newer class ROOT::Math::DistSampler which allows to set an individual TRandom instance for each instance of the sampler. However, I wasn’t able to find or construct a minimal working example, completely equivalent to TF1::GetRandom() but using the DistSampler class, not in a standalone C++ program (not in a ROOT macros).

jblomer · July 3, 2020, 8:15am

@moneta Can you help?

moneta · July 3, 2020, 12:58pm

Hi,

You are correct .TF1::GetRandom is not to be used in multiple threads. We should extend the interface to pass also a random number generator instance.
Using the DistSampler is also a good alternative, you can find an example in the tutorial
tutorials/math/multidimSampling.C
https://root.cern.ch/doc/master/multidimSampling_8C.html

Best regards

Lorenzo

yu.karpenko · July 3, 2020, 1:21pm

hi Lorenzo,

first of all, many thanks for the quick reply!
Indeed, I was using the tutorials/math/multidimSampling.C as a guideline to make the DistSampler class work in a standalone C++ program (not in a ROOT macro).
I attach the corresponding minimal example of the standalone C++ which uses DistSampler. So far when I compile and run it, I get the following run-time error:

Error in ROOT::Math::Factory::CreateDistSampler: Error loading DistSampler plug-in

Internet doesn’t seem to know anything about it. Maybe the ROOT interpreter loads the plug-in automatically, and in the case of standalone C++ it doesn’t happen, and I have no idea how to make it work. Maybe you could point which fn call am I missing ?

–best, Iurii
main-DistSampler.cpp (1.8 KB)

moneta · July 3, 2020, 2:16pm

Hi,

can you try adding unuran to ROOT

cmake -Dunuran=On

yu.karpenko · July 7, 2020, 8:10am

Hi again,

yes, I’ve recompiled ROOT with -Dunuran=On and now both my standalone code with DistSampler and the tutorial ROOT macros tutorials/math/multidimSampling.C work!
Now I should understand the use of DistSampler class …
Thank you for the support!

–best, Iurii

system · July 21, 2020, 8:10am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.