Home | News | Documentation | Download

RDataFrame and TMVA in pyroot

Dear experts,

I was wondering if you could provide a pyROOT implementation of the experimental RReader example provided in RDataFrame & TMVA.

In particular, it is not clear to me how to pythonize the Define command in https://github.com/root-project/root/blob/master/tutorials/tmva/tmva003_RReader.C#L82-L86


@swunsch @etejedor Can you help?


Happy to see that you found the new feature :slight_smile: Though the disclaimer comes first: You are on experimental terrain!

Now the solution to your issue. Unfortunately, PyROOT (so the Python bindings for C++) are not (yet) able to parse the template we use in C++ (see here). However, you can play a little trick to make it still working in Python! Because PyROOT sees all objects created by the C++ interpreter cling, you can just make a call in C++. Please note that the snipplet assumes that you have run the tutorial tmva103_RReader.C before:

import ROOT

TMVA::Experimental::RReader model("tmva003_BDT/weights/tmva003_BDT.weights.xml");
computeModel = TMVA::Experimental::Compute<4, float>(model);

df = ROOT.RDataFrame('TreeS', 'http://root.cern.ch/files/tmva_class_example.root')
df = df.Define('y', ROOT.computeModel, ROOT.model.GetVariableNames())
h = df.Histo1D('y')




Hi Stefan,

This thread proved very helpful indeed. However, I have a related question and was not sure if I should open a new thread, so I shall reply here.

I tried this same approach, and if I understand correctly this particular example works with the tutorial tmva103_RReader.C where we use 4 floats as input. I was wondering if it is possible to use the same approach with a BDT that was trained on different data type inputs. For example, one of my inputs is a double (among other floats) and I see:

Error in <TTreeReaderValueBase::CreateProxy()>: The branch score contains data of type double. It cannot be accessed by a TTreeReaderValue<float>

after execution of the RDF event loop. Is there a workaround for this?



Unfortunately the current implementation supports only floats. But for most models floats should be fully sufficient, so just down cast your inputs and you should be fine!