BDT Event Weighting

Hi TMVA/BDT Experts,

I’m new to using TMVA and BDTs and I have a question about the renormalising of events.

I have read online that the BDT algorithms in TMVA renormalise the signal and background events such that the weights given to them e.g. signalWeightExpression or the global weight do not have affect. I was wondering if some could tell me why this is the case or point me towards some literature explaining why the normalisation does not matter?

The third link below states the following:

BDT doesn’t care about what kind of ‘normalisation’ you’ve chosen in the factory, it simply scales both signal and background beforehand to same effective number of events. This is, because otherwise the first boosting step would be essentially doing ‘just that’ and knowing this, I can also do it straight away.

This seems to be the answer but I don’t understand why this is the case. Can anyone point me to something explaining this statement? Naively I would assume if one of the backgrounds was quite rare that the MVA would not sacrifice as much signal in the classification as compared to a much more common background. In my analysis I produce dedicated background samples which then need to be scaled to one another. So I find it unusually that the BDT does not consider this scaling.

Any help would greatly appreciated

Cheers
Dom

https://sourceforge.net/p/tmva/mailman/message/36233854/
https://sourceforge.net/p/tmva/mailman/message/32959282/

Hi,

Note: This is speculation on my part. If you, or anybody else, know better or discover more, please correct me.

I think this is a remnant from when AdaBoost was the go-to classifier. In AdaBoost, it is my understanding that the first iteration can be approximated by choosing the majority class as the predicted class. The following iteration of the algorithm would then balance the classes and continue from there. Hence when using AdaBoost, we can safely skip ahead.

To my understanding, the situation is different when using gradient boosting. That algoritm does care about the relative weights of signal vs. background. You can trade-off precision vs. recall by weighting the classes differently.

I would point to references, but this from the top of my head, and only for intuition. Hope it helps.

Cheers,
Kim

Just realised I never said you for the response. So thank you. I ended up using gradient boost and I’ve ran with normalised and unnormalised. The results are fairly different so it makes for a good study to know which one is best.

Cheers
Dom