SeparationType option ignored for Gradient Boosted Decision trees?

Andre_Holzner · July 1, 2017, 10:03pm

Hello,

I tried to reproduce the calculation of variable importances for gradient boosted decision trees
using GiniIndex as separation type with ROOT 6.08.06 .

After some debugging it looks like that for gradient boosted decision trees, the SeparationType option
is essentially ignored, the corresponding field MethodBDT::fSepType is reset to NULL in MethodBDT::InitGradBoost() und not set again (see https://github.com/root-project/root/blob/3c842ce20edc9bd72dbd40f1e7b071d6f49e4170/tmva/tmva/src/MethodBDT.cxx#L1536 ) and is still NULL when instances of DecisionTree are created.

The DecisionTree objects then effectively have a regression (square of residual) loss (see https://github.com/root-project/root/blob/3c842ce20edc9bd72dbd40f1e7b071d6f49e4170/tmva/tmva/src/DecisionTree.cxx#L180 ) and correspondingly also the separation gain for a node split is calculated using RegressionVariance, not using the metric specified by the SeparationType option.

Is this on purpose ?

best regards,

Andre

bellenot · July 3, 2017, 9:22am

Maybe @moneta can help…

moneta · July 3, 2017, 10:45am

I don’t know. I will forward your question to the author,

Cheers

Lorenzo

kialbert · July 3, 2017, 12:50pm

Yes this is on purpose, the separation type is used by other boosting methods to calculate the response of of the decision tree leaves. Gradient boosting uses a specific equation for the leaf response which can be found in TMVA::MethodBDT::GradBoost. (This then overwrites what was already in the leaf.)

The start of the call chain that lead to GradBoost is at MethodBDT::1335 which reads Double_t bw = this->Boost(*fTrainSample, fForest.back());.

Cheers,
Kim