I am doing a binary classification with BDT (AdaBoost) from TMVA.
I have S/B events in equal amount (say 1M each) for the training.
But in the simulation data, S/B are actually in the ratio of 1/9.
In order for the trained model to reflect this S/B distribution, do the following 3 methods work?
method 1: with equal amount of S/B, use weights
factory->AddBackgroundTree(SigTree, 1.0);
factory->AddBackgroundTree(BkgTree, 9.0);
method 2:
reselect the events from S/B tree so that they have the correct proportion
but this ends up with imbalanced data, doesnt it ?
method 3:
reselect the events from S/B tree so that they have the correct proportion.
then use weights
factory->AddBackgroundTree(SigTree, 1.0);
factory->AddBackgroundTree(BkgTree, 9.0);
btw, I am also not quite understand how to use these weights. could somebody explain to me ?
Thanks in advance
_ROOT Version: 6.24 (PyROOT via conda)
_Platform:Centos7
_Compiler: gcc9