How to dump XGBoost model to cpp file

agarabag · May 3, 2024, 3:34am

I’d like to use the output of an xgboost BDT model in a code base without having to rely explicitly on xgboost or otherwise. Using a modified version of this script, xgb2cpp I am able to generate a cpp function which, to my eye, takes exactly the output of a trained network dumped to a txt file and converts to a cpp function. However, when running this function with the exact same inputs, I cannot replicate probabilities output by xgboost itself.

Here’s a minimal working example of the code used to make the BDT:

import xgboost as xgb
import numpy as np

# Create dummy data for training and testing
X_train = np.random.rand(100, 4)
y_train = np.random.randint(0, 2, size=100)

X_test = np.random.rand(20, 4)

clf = xgb.XGBClassifier(n_estimators=2, min_child_weight=5, gamma=0.5, max_depth=2)
clf.fit(X_train, y_train)
y_pred_proba = clf.predict_proba(X_test)

# dump model to txt file
clf.get_booster().dump_model('test_minimal/dump.raw.txt')

print(X_test)
print(y_pred_proba)

The dumped model looks like this:

booster[0]:
0:[f0<0.460618258] yes=1,no=2,missing=2
    1:leaf=-0.10681767
    2:[f2<0.684740245] yes=3,no=4,missing=4
        3:leaf=0.000418715936
        4:leaf=0.176613688
booster[1]:
0:[f3<0.255429506] yes=1,no=2,missing=2
    1:leaf=-0.136423871
    2:[f0<0.607831895] yes=3,no=4,missing=4
        3:leaf=-0.00219070958
        4:leaf=0.125992939

with corresponding cpp:

float classify(std::vector<float> &sample) {

    float sum = 0.0;

    if (sample[0] <0.460618258) {
        sum += -0.10681767;
    } else {
        if (sample[2] <0.684740245) {
            sum += 0.000418715936;
        } else {
            sum += 0.176613688;
        }
    }


    if (sample[3] <0.255429506) {
        sum += -0.136423871;
    } else {
        if (sample[0] <0.607831895) {
            sum += -0.00219070958;
        } else {
            sum += 0.125992939;
        }
    }


    return sum;
}

When testing for probabilities I pass the output of the cpp function to the sigmoid function to convert to a probability:

float sigmoid(float sum) {
    return 1.0 / (1.0 + exp(-sum));
}

But this result does not match the result from xbgoost… (more specifically when i use model.predict_proba(X_test) to get the probablities) so what gives? Does xgboost do something I am not realizing? Any help is appreciated!

vpadulan · May 3, 2024, 7:03am

Dear @agarabag ,

Thanks for reaching out to the forum! The xgb2cpp program you report is not related to ROOT. Nonetheless, I will try to help you with your issue. By looking at the README, it says it only works if the xgb model was trained in 'multi:softprob' mode, so I am guessing you made sure this was your case.

Another curiosity I have is regarding the different outputs you get between the Python xgboost classifier and the simple c++ function. How different are those outputs?

Finally, I would also suggest contacting the author of the program you are using who might be faster in understanding your case and giving you support.

Cheers,
Vincenzo