How gradient decent implemented in BDTG model?

Dear all TMVA experts,

I am a little bit confused on how the gradient decent optimization implemented in the BDTG(Gradient Boosting Decision Tree), because cost function of the BDTG model is to minimize the mis-classification rate, in that case we can not do the derivative of this cost function(because mis-classification rate is discrete rather than differentiable) like in the gradient decent. Anyone know the implementation?

Many thanks!

@moneta can you point Hai to e.g. doc or the sources?


The loss function used for classification (both binary and multi-class) for bdtg in tmva is cross-entropy. The original description of the algorithm can be found here, section 4.6.

This loss function is differentiable.

NOTE: The TMVA Users’ guide does not in detail describe the process and properties of GBDT, which is why these other sources are linked. The actual implementation follows the one in the linked paper closely.