Multiclass training of a BDT - how does it work?

I was looking through the ROOT TMVA documentation to try to sort out how the Multiclass mode works for TMVA training of a BDT. While it clearly supports it, I was not able to find a discussion of the algorithm that is used.


BDT multiclass classification is currently only supported by the Gradient Boosted variant of BDT’s. As far as I know the TMVA implementation is a straight forward translation of the algorithm described in this paper[1], where the method was first introduced.

The basic idea is to approximate a target function in a stagewise manner, improving the result at each stage in a region defined by the gradient of the error. The multiclass application uses one of the target functions defined in the paper.

If you have more questions I’d be happy to answer them.

[1]: Greedy function approximation: A gradient boosting machine, Jerome H. Friedman

Thanks, I will take a close look at the paper!