Dear All,
Setup
I train the BDT tree where I have only 1 signal and 1 background category. The training options are set as:
...
dataloader->SetSignalWeightExpression(signalWeightExpression.data());
dataloader->SetSignalWeightExpression(backgroundlWeightExpression.data());
...
dataloader->PrepareTrainingAndTestTree(cut, "nTrain_Signal=0:nTrain_Background=0:nTest_Signal=0:nTest_Background=0:SplitMode=Random:NormMode=NumEvents:!V");
...
factory->BookMethod(dataloader, "BDT", "BDTG", "!H:!V:NTrees=1000:BoostType=Grad:Shrinkage=0.20:UseBaggedBoost:GradBaggingFraction=0.5:SeparationType=GiniIndex:nCuts=500:PruneMethod=NoPruning:MaxDepth=5");
...
Reweighting
The samples are pruned in a way that they have the same shape. Miraculously this time I have more signal than background events. I have checked the setups when I do not pass extra weights (will refer further as “non-pre-weighted”) and when I do (meaning that backgroundlWeightExpression
contains only 1
while signalWeightExpression
contains only 0.75
; will refer further as “pre-weighted” case). [3]
Checks
In order to test the impact of different options on the training, I went down to using 0.1% of events in both of my sets, leaving 12000 and 10000 events for signal and background respectively. I have judged on the performance by comparing the values in final tables Testing efficiency compared to training efficiency (overtraining check)
and Evaluation results ranked by best signal efficiency and purity (area)
.
Observation
- The “non-pre-weighted” and “pre-weighted” trainings gave same results
- The switch between
None
,NormMode
andEqualNumEvents
fordataloader
Norm:
option gave same results (If I understand it right this is becauseSkipNormalization
is set toTrue
by default along with other other factory options and the reweighting is done during the training only on the Training part of the data) - Setting
SkipNormalization=True
gave the largest ROC-integral and signal efficiency, yet still the same wrtdataloader
Norm:
options or “pre-weighting” of the data
Question
I have difficulties interpreting why I see what I see and I hope you could provide me with a bit more insights on that. And I would like to know if I am safe going to the setup where I do not provide a global weight, use SkipNormalization=True
and EqualNumEvents
(though seems like the last one can be any other choice).
Note
I am writing this question after reading the two posts [1], [2] which were very useful but I’m not 100% they answer my question.
[1] Adding treess to dataloader with weights does not work (for me)
[2] https://sourceforge.net/p/tmva/mailman/message/34431489/
[3] I know about global weights but for historical reasons, this is the setup so far.