XGBoost Multiclass Predictions Inconsistent Between ROOT/Python

Chandler_K · October 3, 2023, 1:02pm

I have trained an XGBClassifier() in Python. It is a multi-class classifier and is defined as:

model = XGBClassifier(
    max_depth=7,
    reg_alpha=2,
    reg_lambda=4,
    gamma=0.1,
    objective='multi:softprob',
    num_class=4,
    learning_rate=0.1,
    n_estimators=100,
    eval_metric=['mlogloss'],
    missing=99999.0,
    early_stopping_rounds=10, 
    seed=0,
    nthread=8
)

However, when I then save this model to .root format via ROOT.TMVA.Experimental.SaveXGBoost(model, MODEL_NAME, f"{PATH_TO_MODEL[:-5]}.root", num_inputs=N_FEATURES) then I write a simple ROOT script that prints the predictions of this model they are inconsistent with my Python model. For a random sample from the training data I find:

Python predictions = [0.98837775, 0.00120619, 0.00707766, 0.00333839]
ROOT Predictions = [0.89028, 0.032496, 0.0263328, 0.0508909]

(The training sample is the same - I have even printed out the numpy array and copied and pasted it into my ROOT script).

bellenot · October 3, 2023, 1:06pm

Maybe @moneta will be able to help once he’s back in a couple of days

Chandler_K · October 3, 2023, 1:47pm

So I think I have found the solution to the problem. The “missing” parameter is not supported either during the encoding in SaveXGBoost or in the TMVA::Experimental::RBDT<>. If you re-train the model with the missing parameter removed (in Python) the ROOT model generates the same predictions.

Caveat being of course that the missing parameter does aid in training the models. Maybe this feature should be added/supported?