XGBoost to TMVA conversion and TMVA inputs for RDataFrame operation

Dear experts, i am using ROOT 6.34.04 and i am loading a pickle file from a trained XGboost model input.

I see in XGboost they have a from xgboost2tmva import convert_model
which should do a conversion to XML but i also see that ROOT itself has tutorials to follow for this conversion step.

I was wondering nowadays what is the more recommended way and if anyone has recent pointers on how to make a

XGBoost conversion to TMVA and an exmaple of using the reader of TMVA to define variables in a RDataFrame operation.

Thanks in advance,
Renato

PS: in the meanwhile i keep experimenting and report if i find a working setup.

Hi @rquaglia,
Thank you for your question.
@moneta could you please take a look?

So, knowing in advance the training variables this seems to work on ROOT 6.34

import pickle
import xgboost as xgb
from xgboost import XGBClassifier

import ROOT
import numpy as np

# Load XGBoost model
with open('BDTS_block7.pickle', 'rb') as file:
    xgb_model = pickle.load(file)
features_expected = [ "B_BPVIP",
                      "B_BPVIPCHI2",
                      "B_END_VCHI2DOF",
                      "B_BPVDIRA",
                      "B_DOCA12",
                      "H_MINIP"]
if len(features_expected) != xgb_model.n_features_in_ : 
    raise ValueError("Invalid expected features to model features length")
feature_names = ['f' + str(i) for i in range(xgb_model.n_features_in_)]
print(feature_names)
ROOT.TMVA.Experimental.SaveXGBoost(xgb_model, "myModel", "output_model.root", num_inputs=len(feature_names))
bdt = ROOT.TMVA.Experimental.RBDT("myModel", "output_model.root")
df = ROOT.RDataFrame("DecayTree", "test.root") 
node = df.Define("H_MINIP", "H1_BPVIP", "H2_BPVIP")
cols_input = [] 
for idx, c in enumerate(features_expected) : 
    print(idx,c)
    node = node.Define( f"BDTs_Input{idx}", f"(float){c}")
    cols_input.append( f"BDTs_Input{idx}")
node = node.Define("BDTv", ROOT.TMVA.Experimental.Compute[len(features_expected), float](bdt), cols_input).Define('BDTs', 'BDTv[0]')
c = ROOT.TCanvas()
h = node.Filter("BDTs>0.1").Histo1D("B_M")
h.Draw()
c.Draw()
c.SaveAs("BDTs.pdf")

but when i run it i see a lot of

cling::DynamicLibraryManager::loadLibrary(): libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory
Error in <TInterpreter::TCling::AutoLoad>: failure loading library libTMVA.so for TMVA::Experimental
cling::DynamicLibraryManager::loadLibrary(): libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory
Error in <TInterpreter::TCling::AutoLoad>: failure loading library libTMVA.so for TMVA::Experimental
cling::DynamicLibraryManager::loadLibrary(): libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory
Error in <TInterpreter::TCling::AutoLoad>: failure loading library libTMVA.so for TMVA::Experimental::Internal
cling::DynamicLibraryManager::loadLibrary(): libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory
Error in <TInterpreter::TCling::AutoLoad>: failure loading library libTMVA.so for TMVA::Experimental
cling::DynamicLibraryManager::loadLibrary(): libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory
Error in <TInterpreter::TCling::AutoLoad>: failure loading library libTMVA.so for TMVA::Experimental::Internal
cling::DynamicLibraryManager::loadLibrary(): libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory
Error in <TInterpreter::TCling::AutoLoad>: failure loading library libTMVA.so for TMVA::Experimental
cling::DynamicLibraryManager::loadLibrary(): libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory
Error in <TInterpreter::TCling::AutoLoad>: failure loading library libTMVA.so for TMVA::Experimental::Internal

An additional query , is any of the ‘reader’ or converter provided by ROOT actually working on GBReweighter from hepml ?
Is there any compatibile way to make the conversion and loading within RDataFrame operation for it? ( or can one convert it to another equivalent model which is then readable from TMVA::Experimental ?