RDataFrame and TMVA in pyroot

dertexaner · April 16, 2020, 7:54pm

Dear experts,

I was wondering if you could provide a pyROOT implementation of the experimental RReader example provided in RDataFrame & TMVA.

In particular, it is not clear to me how to pythonize the Define command in https://github.com/root-project/root/blob/master/tutorials/tmva/tmva003_RReader.C#L82-L86

Thanks!

jblomer · April 17, 2020, 8:46am

@swunsch @etejedor Can you help?

swunsch · April 17, 2020, 8:52am

Hi!

Happy to see that you found the new feature Though the disclaimer comes first: You are on experimental terrain!

Now the solution to your issue. Unfortunately, PyROOT (so the Python bindings for C++) are not (yet) able to parse the template we use in C++ (see here). However, you can play a little trick to make it still working in Python! Because PyROOT sees all objects created by the C++ interpreter cling, you can just make a call in C++. Please note that the snipplet assumes that you have run the tutorial tmva103_RReader.C before:

import ROOT

ROOT.gInterpreter.ProcessLine('''
TMVA::Experimental::RReader model("tmva003_BDT/weights/tmva003_BDT.weights.xml");
computeModel = TMVA::Experimental::Compute<4, float>(model);
''')

df = ROOT.RDataFrame('TreeS', 'http://root.cern.ch/files/tmva_class_example.root')
df = df.Define('y', ROOT.computeModel, ROOT.model.GetVariableNames())
h = df.Histo1D('y')

h.Draw()

Best
Stefan

dan_m · April 21, 2020, 4:50pm

Hi Stefan,

This thread proved very helpful indeed. However, I have a related question and was not sure if I should open a new thread, so I shall reply here.

I tried this same approach, and if I understand correctly this particular example works with the tutorial tmva103_RReader.C where we use 4 floats as input. I was wondering if it is possible to use the same approach with a BDT that was trained on different data type inputs. For example, one of my inputs is a double (among other floats) and I see:

Error in <TTreeReaderValueBase::CreateProxy()>: The branch score contains data of type double. It cannot be accessed by a TTreeReaderValue<float>

after execution of the RDF event loop. Is there a workaround for this?

Thanks,
Spandan

swunsch · April 21, 2020, 9:08pm

Hi!

Unfortunately the current implementation supports only floats. But for most models floats should be fully sufficient, so just down cast your inputs and you should be fine!

Best
Stefan