TMVA in RDataframe Experimental

Hello,

i am trying to make a TMVA application in RDataframe, I have a model trained with vector/scalar variables. While I make use of the experimental features shown in https://root.cern/doc/master/tmva003__RReader_8C.html for my use case:

RReader model("data/xml/TMVAClassification_BDTG.weights.xml");
auto computeModel = Compute< 11 , float >(model);
auto variables = model.GetVariableNames();
auto df3 = df2.Define("mvaBDTG", computeModel , variables )

I have gotten error such as

Error in <TTreeReaderValueBase::GetBranchDataType()>: Must use TTreeReaderArray to read branch Electron_miniPFRelIso_chg: it contains an array or a collection.
Error in <TTreeReaderValueBase::CreateProxy()>: The branch Electron_miniPFRelIso_chg contains data of type {UNDETERMINED TYPE}, which does not have a dictionary.

I figured that the experimental feature could only handle scalar input variable; thus i made all input variable scalar (for vector variables, i take the first index in the array) and i have gotten another error:

RDataFrame::Run: event loop was interrupted
terminate called after throwing an instance of 'std::runtime_error'
what():  Size of input vector is not equal to number of variables.
Aborted (core dumped)

At this stage i am not sure how to debug this… Could you clarify that the experimental TMVA application could only read scalar variable? Is that possible to take vector as input in TMVA application under RDataframe, if not, what is the right way to do it?

Thanks and looking forward to hear from you.
Cheers,
Siewyan

ROOT Version: 6.22/00
Platform: Ubtuntu 18.04
Compiler: gcc 7.5.0


Hi there!

Let’s see why the model has a different number of variables than your expectation. You can look at the variables with

auto variables = model.GetVariableNames();

Is the output what you expect? It could well be the that TMVA XML is not parsed correctly if they are vectors/objects involved (it’s experimental :wink: ). Next, you could have a look in the XML itself, it’s not too hard to figure out the fields, which hold the expected variables. Another possibility to debug would be using the old TMVA::Reader and see whether this works.

Best
Stefan

Hi,

Thanks for the answer. I have checked the XML content which give me an idea what input the RReader is expecting. It consists of scalar and vector variables, if i parse it with

auto variables = model.GetVariableNames();
auto df3 = df2.Define("mvaBDTG", Compute< 11 , float >(model) , variables )

I got an error about unknown error

terminate called after throwing an instance of 'std::runtime_error'
what():  Unknown column: Electron_miniPFRelIso_neu
Aborted (core dumped)

The branch Electron_miniPFRelIso_neu is present in XML and it is derived from the expression

Electron_miniPFRelIso_neu = Electron_miniPFRelIso_All - Electron_miniPFRelIso_chg

In order to have the equivalence in the Reader application, I have defined the column accordingly. Somehow the Reader is not able to pick it up… Do you know what causes it? Thanks!

Cheers,
Siewyan

Hi!

Alright, I see. Yes, the TMVA::Experimental::Reader doesn’t have expressions implemented. Indeed, in a RDataFrame workflow I would keep this out of the Reader but put this logic into RDataFrame for the sake of simplicity. But I also see that this clashes with the existing TMVA interfaces. To be discussed in the future :slight_smile:

Best
Stefan

Hi,

Ah i see, so this is an expected feature from Experimental::Reader. May i know is there any way around (with expression implementation on variables) to evaluate BDT score in RDataframe with thread-safe way? Could you provide an example how to do it? Thanks!

Cheers,
Siewyan

Hi,

Sry I missed your question at the end of your previous post.

If you want to stick to the TMVA::Experimental::Reader interface, you should define your expression as an own column in the training to get rid of the expression. However, since the BDT implementation in TMVA is not thread safe, the Reader will use a global lock to make it thread safe. Probably that’s not what you want.

You can instantiate a classic TMVA::Reader once per thread and assign these to do the evaluation naturally thread safe. In RDataFrame we have the DefineSlot interface (see here) to make this possible.

Best
Stefan

Hi,

Sorry for the delay. I was assessing my option on how to take on this. I am eager to try out the experimental feature as it offer an elegant way to perform BDT score evaluation. If I understood, you are suggesting to defines those features during training to avoid expressive input; while on the application side i shall use DefineSlot to preserve thread safety?

If this is correct, Is there an example on how to perform training with RDataframe?

Thanks!
Cheers,
Siewyan

Hi!

You cannot directly perform training with RDataFrame. You still have to do the training with TMVA and define the desired quantities as branches of the TTree. But you can use DefineSlot to use TMVA::Reader in RDataFrame and run on multiple threads.

Best
Stefan

HI,
Thanks again for the clarification. I am moving to work on using DefineSlot in my application. However, the DefineSlot documentation is not very enlightening to me…
I have the setup below, for example the lambda function defined as:

using namespace ROOT::VecOps;
auto predict = [](                                                                                                                                                                                        
                    unsigned int nthread,                                                                                                                                                                   
                    const RVec<float> &electron_miniPFRelIso_chg,                                                                                                                                           
                    const RVec<float> &electron_miniPFRelIso_neu
){
TMVA::Reader* reader = new TMVA::Reader();                                                                                                                                                                                                                                                      
float electron_miniPFRelIso_chg_0 = electron_miniPFRelIso_chg[0];                                                                                                                                       
float electron_miniPFRelIso_neu_0 = electron_miniPFRelIso_neu[0];   

reader->AddVariable( "Electron_miniPFRelIso_chg" , &electron_miniPFRelIso_chg_0 );                                                                                                                      
reader->AddVariable( "Electron_miniPFRelIso_neu" , &electron_miniPFRelIso_neu_0 );          

reader->BookMVA( "BDT::BDTG" , "data/xml/test_BDTG.weights.xml" );

return reader->EvaluateMVA("BDTG");
};

while in the workflow:

df.DefineSlot( name , predict , { nthread , "Electron_miniPFRelIso_chg" , "Electron_miniPFRelIso_neu"})

I am not sure how to use nthread, aka slot parameter in this case… or how to correctly used DefineSlot.

Could you suggest more pointers on how to use DefineSlot ?
Thanks!
Siewyan

Hi there!

Here is a small example how you can integrate the TMVA::Reader (mocked by the Reader class there) in a multi-threaded RDataFrame workflow:

struct Reader {
    float GetMvaValue(float x, float y) { return x * y; }
};

void test() {
    // Enable MT and get the pool size
    ROOT::EnableImplicitMT();
    const auto poolSize = ROOT::GetThreadPoolSize();

    // Create the TMVA::Readers
    vector<Reader> readers(poolSize);

    // Create a callable evaluating the readers per slot
    auto eval = [&readers](unsigned int slot, float x, float y) { return readers[slot].GetMvaValue(x, y); };

    // Create a RDF with 10 rows and two columns
    ROOT::RDataFrame df(10);
    auto df2 = df.Define("x", "(float)rdfentry_").Define("y", "(float)rdfentry_");

    // Make the evaluation
    auto df3 = df2.DefineSlot("mva", eval, {"x", "y"});

    // Print the result
    auto mva = df3.Take<float>("mva");
    for(auto& x: mva) cout << x << endl;
}

Note that you want to instantiate the readers in a vector before and not in the lambda itself. Otherwise this will result in a horrible performance. You can create them upfront and put the object (or the pointer) in a vector, which you capture (see the readers&) the vector in the lambda.

I guess you have already found the main RDataFrame docs here: https://root.cern/doc/master/classROOT_1_1RDataFrame.html

Probably @eguiraud knows whether we have additional docs or tutorials for DefineSlot!

Best
Stefan

Hi,
we don’t have a tutorial, but the DefineSlot docs have a small example usage and some explanation.

Cheers,
Enrico

Hi all,

Thanks for the providing a more concrete example. This is what i have come up with

template<typename T>                                                                                                                                                                                        
auto BDT_reader( T &df , const std::string &name ) {                                                                                                                                                        
  using namespace ROOT::VecOps;                                                                                                                                                                             
                                                                                                                                                                                                            
  // Create the TMVA::Reader                                                                                                                                                                                
  const auto poolSize = ROOT::GetThreadPoolSize();                                                                                                                                                          
  std::vector<TMVA::Reader*> readers(poolSize);                                                                                                                                                             
                                                                                                                                                                                                            
  auto predict = [&readers](                                                                                                                                                                                
                    unsigned int nslot,                                                                                                                                                                     
                    const RVec<float> &electron_miniPFRelIso_chg,                                                                                                                                           
                    const RVec<float> &electron_miniPFRelIso_neu,                                                                                                                                           
                    const RVec<float> &electron_dxy,                                                                                                                                                        
                    const RVec<float> &jet_btagDeepFlavB,                                                                                                                                                   
                    const RVec<float> &electron_jetPtRelv2,                                                                                                                                                 
                    const RVec<float> &electron_jetPtRatio                                                                                                                                                  
                    ){                                                                                                                                                                                      
    for (size_t i=0 ; 0<readers.size() ; i++ ){                                                                                                                                                             
                                                                                                                                                                                                            
      float electron_miniPFRelIso_chg_0 = electron_miniPFRelIso_chg[0];                                                                                                                                     
      float electron_miniPFRelIso_neu_0 = electron_miniPFRelIso_neu[0];                                                                                                                                     
      float electron_dxy_0 = electron_dxy[0];                                                                                                                                                               
      float jet_btagDeepFlavB_0 = jet_btagDeepFlavB[0];                                                                                                                                                     
      float electron_jetPtRelv2_0 = electron_jetPtRelv2[0];                                                                                                                                                 
      float electron_jetPtRatio_0 = electron_jetPtRatio[0];                                                                                                                                                 
                                                                                                                                                                                                            
      readers[i]->AddVariable( "Electron_miniPFRelIso_chg" , &electron_miniPFRelIso_chg_0 );                                                                                                                
      readers[i]->AddVariable( "Electron_miniPFRelIso_neu" , &electron_miniPFRelIso_neu_0 );                                                                                                                
      readers[i]->AddVariable( "Electron_dxy" , &electron_dxy_0 );                                                                                                                                          
      readers[i]->AddVariable( "Jet_btagDeepFlavB" , &jet_btagDeepFlavB_0 );                                                                                                                                
      readers[i]->AddVariable( "Electron_jetPtRelv2" , &electron_jetPtRelv2_0 );                                                                                                                            
      readers[i]->AddVariable( "Electron_jetPtRatio" , &electron_jetPtRatio_0 );                                                                                                                            
                                                                                                                                                                                                            
      readers[i]->BookMVA( "BDT::BDTG" , "data/xml/test_BDTG.weights.xml" );                                                                                                                                
    }                                                                                                                                                                                                       
                                                                                                                                                                                                            
    return readers[nslot]->EvaluateMVA("BDTG");                                                                                                                                                             
  };                                                                                                                                                                                                        
                                                                                                                                                                                                            
  return df.DefineSlot( name , predict , { "Electron_miniPFRelIso_chg" , "Electron_miniPFRelIso_neu" , "Electron_dxy" , "Jet_btagDeepFlavB" , "Electron_jetPtRelv2" , "Electron_jetPtRatio" } );            
}                                                                                                

The code is compiled but when i run it i have gotten segmentation fault. Sorry, i am still flying blind here… Could you diagnose the code??

Thanks again!!
Cheers,
Siewyan

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.