Understanding the normalization setting for and behind the BDT

Olena · January 28, 2019, 12:30pm

Dear All,

Setup

I train the BDT tree where I have only 1 signal and 1 background category. The training options are set as:

...
dataloader->SetSignalWeightExpression(signalWeightExpression.data());
dataloader->SetSignalWeightExpression(backgroundlWeightExpression.data());
...
dataloader->PrepareTrainingAndTestTree(cut, "nTrain_Signal=0:nTrain_Background=0:nTest_Signal=0:nTest_Background=0:SplitMode=Random:NormMode=NumEvents:!V");
...
factory->BookMethod(dataloader, "BDT", "BDTG", "!H:!V:NTrees=1000:BoostType=Grad:Shrinkage=0.20:UseBaggedBoost:GradBaggingFraction=0.5:SeparationType=GiniIndex:nCuts=500:PruneMethod=NoPruning:MaxDepth=5");
...

Reweighting

The samples are pruned in a way that they have the same shape. Miraculously this time I have more signal than background events. I have checked the setups when I do not pass extra weights (will refer further as “non-pre-weighted”) and when I do (meaning that backgroundlWeightExpression contains only 1 while signalWeightExpression contains only 0.75; will refer further as “pre-weighted” case). [3]

Checks

In order to test the impact of different options on the training, I went down to using 0.1% of events in both of my sets, leaving 12000 and 10000 events for signal and background respectively. I have judged on the performance by comparing the values in final tables Testing efficiency compared to training efficiency (overtraining check) and Evaluation results ranked by best signal efficiency and purity (area).

Observation

The “non-pre-weighted” and “pre-weighted” trainings gave same results
The switch between None, NormMode and EqualNumEvents for dataloader Norm: option gave same results (If I understand it right this is because SkipNormalization is set to True by default along with other other factory options and the reweighting is done during the training only on the Training part of the data)
Setting SkipNormalization=True gave the largest ROC-integral and signal efficiency, yet still the same wrt dataloader Norm: options or “pre-weighting” of the data

Question

I have difficulties interpreting why I see what I see and I hope you could provide me with a bit more insights on that. And I would like to know if I am safe going to the setup where I do not provide a global weight, use SkipNormalization=True and EqualNumEvents (though seems like the last one can be any other choice).

Note

I am writing this question after reading the two posts [1], [2] which were very useful but I’m not 100% they answer my question.
[1] Adding treess to dataloader with weights does not work (for me)
[2] https://sourceforge.net/p/tmva/mailman/message/34431489/
[3] I know about global weights but for historical reasons, this is the setup so far.

a.bragagnolo · January 28, 2019, 12:52pm

Hi,

the culprit is probably the BDT method option
SigToBkgFraction (Sig to Bkg ratio used in Training) which default value is 1.

To my understanding dataloader reweighting is useless when using the BDT method, since it has an internal
reweighting. What I did in my case was to explicitly select how many sgn/bkg events I wanted with nTrain_Signal, nTrain_Background, nTest_Signal and nTest_Background.

Hoping that this helps you,
Alberto

Olena · January 28, 2019, 1:38pm

Dear @a.bragagnolo, thanks for the fast reply. This is my understanding as well that the reweighting should be done either prior/with dataloader or be controlled by the training itself. What came as surprise is that disabling the BDT normalization with SkipNormalization=True improved the result… even with no dataloader normalization which seems to be necessary in this case but has no major impact.

As far as I know for the BDT the SigToBkgFraction should be set to 1. So your suggestion would be to use nTrain_Signal to move events from Train to Test set?

kialbert · February 1, 2019, 12:19pm

Hi,

As far as I understand, setting the normalisation done for BDT’s (SkipNormalization=False) should be very similar to the dataloader normalisation Norm=EqualNumEvents when SigToBkgFraction=1.0. Both of these renormalise the training data to be of equal importance to force the classifier to make an “interesting” decision, as compared to just outputting the class with the largest global probability.

A quick test with the TMVAClassification example yields that the worst result is gotten with SkipNormalization and NormMode=None.

To specifically answer your question. Disabling the internal BDT normalisation is completely fine.

The internal normalisation is an optimisation that was, to my understanding, developed for AdaBoost. Since you use gradient boosting you can choose whichever normalisation you want.

Cheers,
Kim