SeparationType option ignored for Gradient Boosted Decision trees?

Hello,

I tried to reproduce the calculation of variable importances for gradient boosted decision trees
using GiniIndex as separation type with ROOT 6.08.06 .

After some debugging it looks like that for gradient boosted decision trees, the SeparationType option
is essentially ignored, the corresponding field MethodBDT::fSepType is reset to NULL in MethodBDT::InitGradBoost() und not set again (see https://github.com/root-project/root/blob/3c842ce20edc9bd72dbd40f1e7b071d6f49e4170/tmva/tmva/src/MethodBDT.cxx#L1536 ) and is still NULL when instances of DecisionTree are created.

The DecisionTree objects then effectively have a regression (square of residual) loss (see https://github.com/root-project/root/blob/3c842ce20edc9bd72dbd40f1e7b071d6f49e4170/tmva/tmva/src/DecisionTree.cxx#L180 ) and correspondingly also the separation gain for a node split is calculated using RegressionVariance, not using the metric specified by the SeparationType option.

Is this on purpose ?

best regards,

Andre

Maybe @moneta can help…

I don’t know. I will forward your question to the author,

Cheers

Lorenzo

Yes this is on purpose, the separation type is used by other boosting methods to calculate the response of of the decision tree leaves. Gradient boosting uses a specific equation for the leaf response which can be found in TMVA::MethodBDT::GradBoost. (This then overwrites what was already in the leaf.)

The start of the call chain that lead to GradBoost is at MethodBDT::1335 which reads Double_t bw = this->Boost(*fTrainSample, fForest.back());.

Cheers,
Kim