Dear TMVA,
I have noticed very strange ranking numbers in a simple BDTG setup. As a test I tried a classification problem (signal & background) using two variables: the phi of the lepton (Events were required to have ==1) and the sum-ET in the event. This was in ATLAS simulations, where the phi symmetry, while not perfect, is very good, so we can be confident that the sum-ET is much the most powerful variable. This is confirmed by look at acceptance of events passing a BDT score cut: almost flat in phi, and hard turn-on in sum-Et. But the ranking variables were as follows:
: Ranking input variables (method specific)...
BDTG : Ranking result (top variable is best ranked)
: --------------------------------------------------
: Rank : Variable : Variable Importance
: --------------------------------------------------
: 1 : lepton_phi_NOSYS : 5.403e-01
: 2 : met_sumet_NOSYS : 4.597e-01
The ranking claims the phi variable is more powerful, which is clearly highly implausible, and indeed by training with/without phi I can see it has essentially no impact on the ROC integral.
So how can I interpret these ranking scores?
ps I could post the 75GB of training data but I think the proble is simple enough this is overkill.