Facing error when using RDF to deal with TMVA .xml file via PyROOT

Dear experts,
When I try to use RDF in PyROOT to deal with TMVA .xml file, I face a strange error that it can not find computeModel from ROOT. I find other answer in the forum using almost the same script as mine do not face the same error. I am really confused of it. Here are the error message and my script.

In addition, I am sure BDT.xml file could be read successfully because variables can be defined correctly.

raise AttributeError("Failed to get attribute {} from ROOT".format(name)) AttributeError: Failed to get attribute computeModel from ROOT

import ROOT 
ROOT.gInterpreter.ProcessLine('''
TMVA::Experimental::RReader model("BDT.xml");
auto computeModel = TMVA::Experimental::Compute<21, float>(model);
''')
variables = ROOT.model.GetVariableNames()
rdf = ROOT.RDataFrame('DecayTree', 'data.root')
c1 = ROOT.TCanvas()
df = rdf.Define('y', ROOT.computeModel, variables)

Thank you so much for considering my problem!

Best regards,
Linnuo

Hi @Linnuo!

Is there a reason why you use the gInterpreter? Why not use PyROOT directly for everything? Maybe that works. Let us know!

import ROOT 

model = ROOT.TMVA.Experimental.RReader("BDT.xml")
variables = ROOT.model.GetVariableNames()
rdf = ROOT.RDataFrame('DecayTree', 'data.root')
df = rdf.Define('y',
                ROOT.TMVA.Experimental.Compute[21, "float"](model),
                variables)

Dear @jonas ,
I use gInterpreter because I find someting wrong when directly use PyROOT, and I find instruction on the following topics “~rdataframe-and-tmva-in-pyroot/38930” “~/t/evaluating-mva-within-root-data-frame/49238/4”
Sorry for not putting the whole links because I cannot post my reply with them.

I try your advice, but still face the same problem.

Traceback (most recent call last):
File “TMVA_python.py”, line 12, in
variables = ROOT.model.GetVariableNames()
^^^^^^^^^^
File “/cvmfs/lhcbdev.cern.ch/conda/envs/default/2024-07-10_13-01/linux-64/lib/python3.12/site-packages/ROOT/_facade.py”, line 236, in _getattr
return getattr(self, name)
^^^^^^^^^^^^^^^^^^^
File “/cvmfs/lhcbdev.cern.ch/conda/envs/default/2024-07-10_13-01/linux-64/lib/python3.12/site-packages/ROOT/_facade.py”, line 164, in _fallback_getattr
raise AttributeError(“Failed to get attribute {} from ROOT”.format(name))
AttributeError: Failed to get attribute model from ROOT. Did you mean: ‘module’?

import ROOT 
model = ROOT.TMVA.Experimental.RReader("BDT.xml")
variables = ROOT.model.GetVariableNames()
rdf = ROOT.RDataFrame('DecayTree', 'data.root')
df = rdf.Define('y',
                ROOT.TMVA.Experimental.Compute[21, "float"](model),
                variables)

Best,
Linnuo

At this point, model is a Python variable. Did you try:

variables = model.GetVariableNames()

?

Dear @jonas ,
Thank you so much! It works now.
However, I meet a new problem that if the variables contains formula like “log(IPCHI2)”, it do not work.(IPCHI2 works).

variables = [“log(IPCHI2)”, “IPCHI2”]

The error message is

runtime_error: Unknown column: "log(IPCHI2)"

log(IPCHI2) cannot be recognized. How can I include formula into variables name?

Best,
Linnuo

Hello! Where did you read that using a formula in the variable name is supported? I think it’s not supported actually.

Dear @jonas ,
I think it is supported in ROOT using C++. But this is not really important. I define new columns in my script.
However, I meet a strange error when deal with MLP model. I am really confused of it. It works when I use .xml file from BDT model, but now, it failed.

model = ROOT.TMVA.Experimental.RReader(“MLP.xml”)
variables =[“…”]
df = rdf.Define(‘MLP_value’, ROOT.TMVA.Experimental.Compute20, “float”,variables)
h = df.Histo1D(‘MLP_value’)
h.Draw()

And the error message is

RDataFrame::Run: event loop was interrupted
Traceback (most recent call last):
File"TMVA.py", line 46, in
h.Draw()
^^^^^^
cppyy.gbl.std.runtime_error: TH1D& ROOT::RDF::RResultPtr::operator*() =>
runtime_error: Size of input vector is not equal to number of variables.

Best,
Linnuo

Hi! I can help you with the Python-related problems, but I know too little about TMVA to answer this questions.

Maybe our TMVA expert, @moneta, has some advice?

Hi,
It looks like the number of variables you are passsing in the ‘Compute’ functon (20) does not match the number of variables you pass in the Define function.
If this is not the case, can you please post your running code, including the input data.

Best,

Lorenzo