TMVA BDT specific ranking unreliable

Dear TMVA,
I have noticed very strange ranking numbers in a simple BDTG setup. As a test I tried a classification problem (signal & background) using two variables: the phi of the lepton (Events were required to have ==1) and the sum-ET in the event. This was in ATLAS simulations, where the phi symmetry, while not perfect, is very good, so we can be confident that the sum-ET is much the most powerful variable. This is confirmed by look at acceptance of events passing a BDT score cut: almost flat in phi, and hard turn-on in sum-Et. But the ranking variables were as follows:

                     : Ranking input variables (method specific)...

BDTG : Ranking result (top variable is best ranked)
: --------------------------------------------------
: Rank : Variable : Variable Importance
: --------------------------------------------------
: 1 : lepton_phi_NOSYS : 5.403e-01
: 2 : met_sumet_NOSYS : 4.597e-01

The ranking claims the phi variable is more powerful, which is clearly highly implausible, and indeed by training with/without phi I can see it has essentially no impact on the ROC integral.

So how can I interpret these ranking scores?
ps I could post the 75GB of training data but I think the proble is simple enough this is overkill.

Dear Bill,

Thanks for posting: I add in the loop @moneta that can comment about these rankings.


Hi Bill,
I think one should be careful in the interpretation of the BDT ranking.
The importance is derived by counting how often the variables are used to split decision tree nodes, and by weighting each split occurrence by the separation gain-squared it has achieved and by the number of events in the node.


Hi Lorenzo,
Well, OK, that sounds like an English version of the definition. But I have a case were it seems clear the ranking is misleading…how do I extrapolate into more general cases?