Weighting in a TMVA BDT

I’m just starting out using the TMVA package in root and my results seem too good, and I’m wondering if I need to use weighting.

TTree *signal = (TTree*)input->Get("sig");
TTree *background = (TTree*)input->Get("bkg");

Double_t signalWeight = 1.0;
Double_t backgroundWeight = 1.0;

factory->AddSignalTree ( signal, signalWeight );
factory->AddBackgroundTree( background, backgroundWeight );

The signal and background trees contain the correct number of events for my process(ie. ~2000 signal, ~200000 bkg). Do I need to set the weights to reflect this? Or should it use all the events provided?

The reason I ask is before the first cut is applied in the BDT it shows a S/(S+B) of 0.5. But maybe there’s something about how its constructed I don’t understand


You can use 70-80% of the events provided when training and then you can use the remaining ones for testing the trained BDT model.
Depending on the cut on the BDT output score variable you will get different S/(S+B)


Hi Lorenzo, I think by default the example I’m running splits the data between training and testing at a 50/50 split. The thing I’m confused about is why the S/(S+B) at the top of the tree shows 0.5 at the top and then after the cuts gets to 0.98 as expected. The 0.5 at the top seems strange as before any cuts it should be ~0.01


How are you evaluating this “before the first cut”? And is this evaluated on the training or the test data?

If it is evaluated on the training data, then the 50/50 value of S/(S+B) is expected since by default TMVA does reweighing. (E.g. if you evaluate using MaxDepth=0 as an option to the BDT).


Hi Kim, That makes sense. I hadn’t noticed the reweighting. Does that mean the final S/(S+B) in a filled tree is for equal data and background. ie. if S/(S+B) is 0.95 in the final branch but the actual data to background ratio is 1 to 10 then the S/(S+B) achieved from data will be closer to 0.65?


Unfortunately I can’t answer this, but I can say that when evaluating on the training set, the events can be reweighed. When evaluating on the test data they will use the exactly the event weights you specified when adding the trees and setting the weight expression.

This means that if you have as input physical weights you should be able to directly interpret the result of evaluation on the test data. If your weighting differs from this you would need to compensate.