What is the correct input format for xgboost trained on pandas dataframes

David_Marckx · December 22, 2022, 10:06pm

I am a newbie at integrating external ML libraries into TMVA. I trained a xgboost model with the sklearn wrapper and saved it:

bst = XGBClassifier(...)
bst.fit(X_train, y_train, sample_weight=weight_train_balanced) #(x and y are pandas dataframes with float or integer values in them)
ROOT.TMVA.Experimental.SaveXGBoost(bst, "XGB", "location/XGBtest.root", num_inputs=len(X_train.columns))

for SaveXGBoost to work I renamed the columns of my dataframes to f1,f2,… as would happen in native xgboost.

I can then acces it in my c++ framework via:

TMVA::Experimental::RBDT<> bdt("myBDT", "location/XGBtest.root");
varmap["_eventBDT"] = bdt.Compute(featurevec);

I however can’t figure out what input the Compute function of this bdt wants to recieve to give a prediction for one event. I followed the ROOT TMVA tmva103__Application_8C tutorial (as a new user I can’t post links), which indicates an ordered vector of the input variables is the way to go.

But this results in the following error (it seems to want one double…):
error: cannot convert ‘std::vector<float, std::allocator >’ to ‘std::map<std::__cxx11::basic_string, double>::mapped_type’ {aka ‘double’}

A map between "fi"s to its values doesn’t work either. Any ideas what input format would make my little bdt happy?

couet · January 5, 2023, 10:19am

Welcome to the ROOT forum

May be @moneta can help you.

moneta · January 5, 2023, 4:26pm

Hi,

Apologies for the late reply. If you are using C++ the type required by RBDT::Compute is a std::vector<T> for a single event with size equal to the number of features or an RTensor<T> for evaluating many events at the same time with shape (n_events, n_features).
The output is also an std::vector or a RTensor depending on which signature you use.
See the reference documentation.

If you are using Python you can use a Numpy array of shape (n_events, n_features).

In your case it looks like you are using as output a std::map. This seems to be also the cause of the error. Do

auto varEvent = bdt.Compute(featurevec);

Cheers

Lorenzo