Deviations in classification output when converting XGBoost to .root

Hello ROOT enthusiasts :)

I encountered an issue when converting a classifier from XGBoost python package to .root format:
I trained and optimised the classifier using XGBoost. I need to apply to a lot of data in .root format and therefore decided to use ROOT.TMVA.Experimental.SaveXGBoost to convert the classifier to ROOT TMVA and then load it using ROOT.TMVA.Experimental.RBDT. I can apply the classifier using the Compute method of the ROOT TMVA.
However, the classifier response deviates from using XGBoost directly when I use the converted ROOT TMVA.
I traced the issue down to the early stopping functionality that I use in the training of the XGBoost and some L1 regularisation parameter that seems to amplify the deviations.

I attached a minimal working example on how to recreate this behaviour.
I am using ROOT 6.32.02 and XGBoost 2.1.3

It would be nice to know if there exists another solution to use the (XGBoost) classifier in ROOT or avoid this bug somehow.

Cheers,
Lukas

Here the code as I can’t attach it in a file (new account):

import numpy as np

import pandas as pd

import ROOT

from sklearn.model_selection import train_test_split

import xgboost as xgb

from xgboost import XGBClassifier

# random state set for reproducability

random_state = 42

n_signal = 5000

n_background = 5000

n_features = 5

xgb_params = {

“n_estimators”: 150,

“max_depth”: 2,

“learning_rate”: 0.5,

“lambda”: 30,

“early_stopping_rounds”: 30, # This parameter causes issues when converting the classifier to .root format

“alpha”: 1, # This parameter can increase the deviations

“subsample”: 0.7,

}

# ----------------------------------------------------------------------------------------------------------------------

# Generate toy data

# ----------------------------------------------------------------------------------------------------------------------

signal = np.random.normal(loc=1.0, scale=1.0, size=(n_signal, n_features))

background = np.random.normal(loc=-1.0, scale=1.0, size=(n_background, n_features))

signal_labels = np.ones(n_signal, dtype=np.int32)

background_labels = np.zeros(n_background, dtype=np.int32)

columns = [f"feature_{i}" for i in range(n_features)]

df_signal = pd.DataFrame(signal, columns=columns)

df_signal[“label”] = signal_labels

df_background = pd.DataFrame(background, columns=columns)

df_background[“label”] = background_labels

df = pd.concat([df_signal, df_background], ignore_index=True)

# ----------------------------------------------------------------------------------------------------------------------

# Train classifier

# ----------------------------------------------------------------------------------------------------------------------

# Train test split

X_train, X_test, y_train, y_test = train_test_split(df[columns], df[“label”], test_size=0.3, stratify=df[“label”], random_state=random_state)

clf = XGBClassifier(

objective = “binary:logistic”,

random_state=random_state,

**xgb_params, # parse parameters

)

# fit to training data

clf.fit(

X_train.to_numpy(), y_train,

eval_set=[(X_train.to_numpy(), y_train), (X_test.to_numpy(), y_test)],

verbose=False,

)

# ----------------------------------------------------------------------------------------------------------------------

# Convert classifier to ROOT

# ----------------------------------------------------------------------------------------------------------------------

ROOT.TMVA.Experimental.SaveXGBoost(

clf,

“classifier”,

“classifier.root”,

num_inputs=n_features,

    )

clf_root = ROOT.TMVA.Experimental.RBDT(“classifier”, “classifier.root”)

# Apply classifier directly using XGB interface and using ROOT.TMVA.Experimental

data = np.array(X_test, dtype=‘float32’) # ensure datatype is float

y_pred_xgb = clf.predict_proba(data)[:,1].flatten()

y_pred_root = clf_root.Compute(data).flatten()

print(“----------------------------------------”)

print(f"Max. absolute deviation: {max(abs(y_pred_root-y_pred_xgb)):.2f}")

print(f"Max. relative deviation: {max(abs(y_pred_root-y_pred_xgb)/y_pred_xgb)*100:.2f}%")

print(“----------------------------------------”)
1 Like

Welcome to the ROOT Forum!
Let’s see if @moneta can help

For anybody that stumbles into similar issues: I found a workaround to the problem.

The underlying problem is that XGBoost still saves all subtrees when early stopping occurs but knows that it only should use the trees until the best iteration when making predictions. This information gets lost in the conversion to ROOT using TMVA.Experimental.SaveXGBoost and therefore all trees are used in the ROOT converted MVA.
To avoid this, one can retrain the BDT without early stopping while forcing the number of estimators to be equal to the best_iteration + 1 of the previous tree. (set the parameter n_estimators=clf.get_booster().best_iteration + 1)

However, it would be nice if this could be included in ROOT.TMVA.Experimental.SaveXGBoost

1 Like