Convert xgboost model to .xml tmva application file

Hello experts,
Recently, I train a xgboost model with sklearn API, and I want to convert it to .xml file for the application in TMVA to do the evaluation. I use the conversion scripts here: HHbbgg_ETH/convert_pkl2xml.py at common_training_March2019 · chernyavskaya/HHbbgg_ETH · GitHub. This script makes use of this one here : HHbbgg_ETH/tree_convert_pkl2xml.py at common_training_March2019 · chernyavskaya/HHbbgg_ETH · GitHub. And the result seem good, but a little strange. As you can see here , the first plot is the evalution on xgboost with predict_proba method(I use 2*proba - 1 to scale to (-1,1)), and the second plot is the evaluation on TMVA after convert the pkl file to xml file. and you can see, the blue stack distribution is quite different in this two plots, the first one(xgboost) blue stack peak is near 0.9, but the second one(TMVA) blue stack peak is near 1. BTW, the blue stack peak near 0.9 is more reasonable for the physics. So I really need your help to convert the xgboost model to xml TMVA file correctly. Thanks you so much!


@moneta can you help here, please?

Hi,

I am not aware of that script and it is not part of ROOT. We are planning to add if needed this conversion inside ROOT, but it is not present yet. What you can do now is to convert the xgboot model to a ROOT format that is understood by TMVA and evaluate it using the RBDT class.
See the example tutorial tmva/tmva101_Training.py to convert a training model from xgboost to a ROOT format and the example tutorial
tmva/tmva102_Testing.py to evaluate the trained model in TMVA with the RBDT class.
If you have any further question, please let me know

Best regards

Lorenzo

Hi Lorenzo,
First of all, thank you so much for the reply. I know the tutorial you give me, and I have already checked that and indeed it run very well if we convert xgboost model to ROOT format. However, I need to convert it to .xml file for some reason. It’s okay if it can’t be done yet. I would find another way.
Best regrads,
Zhenxuan

Hi,

If there is the need we can develop in ROOT this converter. For the time being you can use the Python converter you add in the link above. If you have problems with it, I suggest you to contact the author.
If the author maintains actively the converter we can think of integrating it in ROOT

Lorenzo

Hi,

I have found this converter from xgboost to TMVA xml,

please let me know if you have problems using that one too

Lorenzo

Hi,
I try this script and have the same problem I asked about at the very beginning.
Zhenxuan

Hi,
If you obtain teh same result also with another conversion tool, it is maybe a problem in the TMVA evaluation of the model. Can you please share the xgboost model, the TMVA xml file, the input data set used for evaluation and the code to produce the plot above so I can investigate the problem ?

Best regards

Lorenzo

Hi,
Here are some basic files you need: Public/ForXGboostDebug at master · ZhenxuanZhang-Jensen/Public · GitHub. I agree with you, it may be something that happens when we evaluate the score, maybe some math tricks on it that I don’t know. The reason I say this is that I try to apply a transform function to the converted TMVA scores(plot2) then it would change to plot1. The function is flashgg/DiPhotonMVAResult.h at 1453740b1e4adc7184d5d8aa8a981bdb6b2e5f8e · cms-analysis/flashgg · GitHub. So I guess is the converted code or some math mechanism inside TMVA evaluation that change the final evaluation score and the shape.
Best,
Zhenxuan

HI,
It is correct, there is a different definition between the score returned by TMVA and the predicted one of Xgboost. There is that transformation described in these linked file that needs to be applied.

Best,

Lorenzo