Some questions on BDT in tmva

Li_Huang · April 15, 2016, 7:43am

Hi,
I think that BDT is a way that it can show what cuts applied, this is an advantages compare to other algorithm. But when I use BDT in TMVA, it is wired that it generate 850 plots. So I have some questions,

Why not BDT use S/sqrt(S+B) to calculate purity , instead it use S/(S+B) . It seems S/sqrt(S+B) is the significance so it should be more efficiency.
Why BDT produce so many tree plots, 850 plots. And more important, Since I thought the first node of the BDT is the best variable to pure the data, why some plots have different first node. And some of the plots with extremely low efficiency.
Can I output what cut BDT did? In 5a) classifier Cut Efficiencies, I can see in the plots , it tell me the best significance and the number of signal, and the number of background. Why it doesn’t print which cut it used as in bdt can give this information?

If anyone knows , can you tell me?

hvoss · April 15, 2016, 10:35am

Hi,

well purity is defined as S/S+B and not S/sqrt(S+B) , which is the statistical significance (as you said)…
so please explain why you would want me to calcualte the significancw if instead I say: purity ? I don’t understand …
why/where do you get 850 plots ?? Do you somehow plot each tree in the forest ?? if so … why??
well, it does give you the MVA-cut value. The x-axis shows “just that” (i.e. the BDT-cut value in order to get the statistical significance/ efficiency or purity according to what is show on the various y-axis…

So maybe I don’t understand your question …

Helge

Li_Huang · April 16, 2016, 11:52am

[quote=“hvoss”]Hi,

well purity is defined as S/S+B and not S/sqrt(S+B) , which is the statistical significance (as you said)…
so please explain why you would want me to calcualte the significancw if instead I say: purity ? I don’t understand …
why/where do you get 850 plots ?? Do you somehow plot each tree in the forest ?? if so … why??
well, it does give you the MVA-cut value. The x-axis shows “just that” (i.e. the BDT-cut value in order to get the statistical significance/ efficiency or purity according to what is show on the various y-axis…

So maybe I don’t understand your question …

Helge[/quote]

Hi Helge,
Let me one by one,

I think purity can be defined by many formulas, for example, the entropy. So S/S+B is just one of the formulas, I guess. Then, if my view is to get the largest significance, can I use the significance as the formula to define purity? The significance, is defined in TMVA users Guide, chapter 3.1.10 Classification performance evaluation, as S/sqrt(S+B)
I don’t know why I get so many plots. When run TMVAClassification.cxx , when see the BDT result, it will automatically plot 850 plots. This is what I am confused. Sorry I don’t know the meaning of forest.
Yes it gives me the MVA-cut value, when x = xxx, I get a largest significance. But I think this x value is correspond to BDT decision function. I think Decision Tree can give the cut base on input variables.

Anyway, I should thanks for you attend to help me.

Best,
Li

hvoss · April 19, 2016, 9:08am

Hi Li,

Well, i think ‘purity’ is well defined, and ‘entropy’ is … but maybe I’m wrong, but I’ve never heard purity defined otherwise. Now… but that’s just language. Your point seems to be rather that using a different ‘separation index’ in building the decision tress would give you better performance in the end, right?
Well, there are different ones you can choose from: Gini-index, Cross-Entropy, and even on trial 'based on statistical significance… Experience shows that in most case Gini-Index or Cross entropy seem to give the best results… so that’s what we took (Gini-Index is somehow ‘standard’ for decision tress)
Which ROOT/TMVA version are you using? I never saw all plots make ‘automatically’. The idea is that you can create them ‘one by one’ for a few examples if you are intestested to see how the individual decision trees look like. (the ensemble of all the 850 decision trees we call a ‘forest’ (many trees )
Well, HOW do you imagine this to look like? Every one of your 850 decision trees discribes for each of its leaf nodes an individual cut-sequence. And in the end, none of those cut sequences is used a s hard cut, but … well, you need to look up how ‘Boosted decision trees’ work. Guess the chapter in the users guide is better than what I would try to explain here in a few lines

Cheers,
Helge