Hi all,
I am trying to teach my RDataFrame scripts to create RooDataSets, use TMVA::Reader and potentially other things.
In a first prototype, in which I obviously did not think about thread safety and potentially other things that could break, the function
auto save_in_ds = [&arglist_vars,&ds] (const std::vector<double>& rdf_vars) -> void {
auto ctr = 0u;
auto vit = arglist_vars.createIterator();
while(auto rrv = static_cast<RooRealVar*>(vit->Next()))
rrv->setVal(rdf_vars[ctr++]);
ds.add(arglist_vars);
};
is used in a Foreach call to my RDataFrame object:
df.Foreach(save_in_ds,{"rdf_vars_for_ws"});
Here, arglist_vars is a RooArgList and the vector rdf_vars_for_ws is created by df.Define together with this RooArgList earlier.
I tested this function with 3 different IMT settings: i) off ii) on, 1 thread iii) on, 4 threads, each with a different outcome. Setting i) worked, setting ii) broke with double free or corruption at a relatively late stage (30-60 % in the loop) and setting iii) breaks almost immediately. From all I can judge, the backtrace points towards a MT issue. As I said, I didn’t think about thread safety at all so far, but I believe that that’s the issue here. If that’s true, I would like to ask you if RooDataSets and other objects can be made thread safe easily?
If you want more info like reproducers or debugging output, please let me know.
Hi Marian,
your save_in_ds lambda is not thread-safe: multiple threads will be writing to the ds variable at the same time. You can either use a different RooDataSet per thread or put a mutex around the unsafe calls.
I’m not sure whether createIterator() is also thread-unsafe, but the stacktrace of the crash will tell you
Hi Enrico,
Using a mutex did the trick, thanks a lot! I didn’t expect it to be that easy
The performance is currently similar to the single-threaded, so i will have a look if ForeachSlot can give a boost.
Thanks,
Marian
If somebody else runs into a similar problem, the solution I found looks like this
const auto dsname = configtree.get("dsname","ds");
RooDataSet ds(dsname.data(),dsname.data(),arglist_vars);
std::mutex dsMutex;
auto save_in_ds = [&arglist_vars,&ds,&dsMutex] (const std::vector<double>& rdf_vars) -> void {
std::lock_guard<std::mutex> l(dsMutex);
auto ctr = 0u;
auto vit = arglist_vars.createIterator();
while(auto rrv = static_cast<RooRealVar*>(vit->Next()))
rrv->setVal(rdf_vars[ctr++]);
ds.add(arglist_vars);
};
df.Foreach(save_in_ds,{"rdf_vars_for_ws"});
Hi Marian,
with that mutex there, calls to save_in_ds will run sequentially. If your program spends most of its time in there, you will get no speed-up from multi-threading (actually you could run a bit slower than with 1 thread).
If save_in_ds is a fraction of your runtime, and the rest can be run in parallel, Amdhal’s law applies.
Hi Enrico,
writing the code with ForeachSlot seems to be harder than i thought. I now have the following code:
std::mutex dsMutex;
auto save_in_ds = [&alists,&dsets,&dsMutex] (unsigned int thread, const std::vector<double>& rdf_vars) -> void {
std::lock_guard<std::mutex> l(dsMutex);
auto ctr = 0u;
auto vit = alists[thread].createIterator();
while(auto rrv = static_cast<RooRealVar*>(vit->Next()))
rrv->setVal(rdf_vars[ctr++]);
dsets[thread].add(alists[thread]);
};
df.ForeachSlot(save_in_ds,{"rdf_vars_for_ws"});
Where alists is a nthreads-sized vector of RooArgList holding unique_ptr to RooRealVars (i tested that each of them points to a different address). When i set up the RooArgLists, I also fill a vector of RooDataSets (dsets).
In principle such a setup wouln’t need a mutex, correct? With or without it, the code segfaults and the error points to the last line of the lambda.
Is there something obvious that i’m doing wrong here?
Cheers,
Marian
Hi,
I guess a debugger will point you right at the problem. If it’s the last line of the lambda, my best guess would be that thread is higher than the dimension of dsets (out-of-bound access).
Hi Enrico,
You were right again
I did not realize that thread in my function goes up to ROOT::GetImplicitMTPoolSize(), which is 10 in my case. I thought that it is only 4 if i set