RDataFrame multithreading loses events

avilla · March 24, 2022, 9:08am

Hi,
I am experiencing some loss of events when filtering large ROOT files with RDataframe.
When EnableImplicitMT is used, the resulting Snapshot and histograms contain less entries than those obtained by running on a single core using the same selection.
As I read here, this bug should be solved, but I still observe it.
I am running ROOT from lb-conda with the latest version:

$ lb-conda default root --version
ROOT Version: 6.26/00
Built for linuxx8664gcc on Mar 05 2022, 12:03:00
From @

couet · March 24, 2022, 9:41am

I guess @eguiraud can help.

eguiraud · March 24, 2022, 10:20am

Hi @avilla ,
thank you for your report, this is not a known issue – as in, as far as I know that has been fixed 4 years ago, as per the issue you linked to, and there were no similar reports since – so this is concerning. Can you please provide a self-contained reproducer that we can run to debug what’s going on?

Cheers,
Enrico

avilla · March 24, 2022, 10:30am

Hi @eguiraud,
sure, I will try to prepare some reproducer for you to run. I will have to provide some input files, since my script is a bit complex. I need to compute the score of three external BDTs from the variables in the tuple, so I’ll add their weight files.
I should also mention that I am running everything on HTCondor, maybe I can try to run it interactively first, to narrow down the search for the bug.
I will get back to you as soon as possible.
Cheers,
Andrea

avilla · March 24, 2022, 11:28am

After a quick investigation, I found out that the problem arises from the computation of the BDT scores.
Since TMVA is not thread-safe (see here), using it inside an RDataframe with EnableImplicitMT activated results in unexpected behaviour.
Therefore, it should be avoided until TMVA gains support for parallel execution.
Leaving this for future reference in case someone else experiences the same problem, this topic can be closed.
Cheers,
Andrea

eguiraud · March 24, 2022, 11:42am

I think TMVA offers some experimental BDT evaluation interfaces that are thread-safe. @moneta can you confirm/point to some resources?

Cheers,
Enrico

moneta · March 24, 2022, 12:26pm

Hi,
Yes, this depends on what you are using for the inference. If you use the TMVA:Reader class is not thread safe. Which particular method are you using ?
We have thread safe interfaces for BDT and Deep learning models

Lorenzo

avilla · March 24, 2022, 1:11pm

I am following the instructions from this past topic, I paste below the relevant part of my code:

ROOT.gInterpreter.ProcessLine(f'''                                                                                                                                                            
TMVA::Experimental::RReader BDT1("{training_dir}/BDT1/weights/BDT1_{cuts}_BDT1.weights.xml");                                                                                                 
TMVA::Experimental::RReader BDT2("{training_dir}/BDT2/weights/BDT2_{cuts}_BDT2.weights.xml");                                                                                                 
TMVA::Experimental::RReader BDT3("{training_dir}/BDT3/weights/BDT3_{cuts}_BDT3.weights.xml");                                                                                                 
computeBDT1 = TMVA::Experimental::Compute<13, float>(BDT1);                                                                                                                                   
computeBDT2 = TMVA::Experimental::Compute<13, float>(BDT2);                                                                                                                                   
computeBDT3 = TMVA::Experimental::Compute<13, float>(BDT3);                                                                                                                                   
''')
df = df.Define('BDT1v', ROOT.computeBDT1, ROOT.BDT1.GetVariableNames())
df = df.Define('BDT2v', ROOT.computeBDT2, ROOT.BDT2.GetVariableNames())
df = df.Define('BDT3v', ROOT.computeBDT3, ROOT.BDT3.GetVariableNames())
df = df.Define('BDT1', 'BDT1v[0]')
df = df.Define('BDT2', 'BDT2v[0]')
df = df.Define('BDT3', 'BDT3v[0]')
df = df.Define('id1', 'idBDT==1')
df = df.Define('id2', 'idBDT==2')
df = df.Define('id3', 'idBDT==3')
df = df.Define('BDT', 'id1*BDT3 + id2*BDT1 + id3*BDT2')
df = df.Filter('BDT > -0.2', 'BDT')

What I see is that when running with multithreading the efficiency of the BDT cut increases, meaning that the script processes some events more than once.
Can you point me to the available thread-safe variants of the BDT classifier?
Is it possible to use it to read already existing weight files or should I rerun the training with this new model?
Thanks

Andrea

moneta · March 24, 2022, 1:42pm

Hi,

The new interface is the RBDT class, see the example tutorial ROOT: tutorials/tmva/tmva103_Application.C File Reference

It takes as input a .root file and the ROOT file it can be generated from XGBoost, see the tutorial
https://root.cern.ch/doc/master/tmva101__Training_8py.html

I don’t think we have yet the support for converting the TMVA xml file to a ROOT file that can be used for RBDT, but there are converters from TMVA XML to XGboost, see for example:

If you find the RBDT class useful, we can develop teh capability of reading directly XML files

Cheers

Lorenzo

eguiraud · March 24, 2022, 2:00pm

avilla:

TMVA::Experimental::RReader BDT3("{training_dir}/BDT3/weights/BDT3_{cuts}_BDT3.weights.xml");                                                                                                 
computeBDT1 = TMVA::Experimental::Compute<13, float>(BDT1);

@moneta Shouldn’t computeBDT1 here be thread-safe?

moneta · March 24, 2022, 2:45pm

Hi,
Actually looking at the code, I see that there should be a lock guard in the RReader::Compute for protecting multiple model evaluations. I will look into this why it is not thread safe.
Otherwise, I think we should be able to do, as we are doing with Sofie, and being able use DefineSlot.

Lorenzo

system · April 7, 2022, 2:46pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.