ROC curve from the BDT and BDTF models

dlin · August 14, 2019, 5:58pm

Dear TMVA experts,

I am learning to use the TMVA package, here is the models (BDT and BDTF) I used:
factory -> BookMethod(dataloader, Types::kBDT, “BDT”, “!H:!V:NTrees=850:MinNodeSize=2.5%:MaxDepth=4:BoostType=AdaBoost:AdaBoostBeta=0.5:UseBaggedBoost:BaggedSampleFraction=0.5:SeparationType=GiniIndex:nCuts=20”);

factory -> BookMethod(dataloader, Types::kBDT, “BDTF”, “!H:!V:NTrees=50:MinNodeSize=2.5%:UseFisherCuts:MaxDepth=3:BoostType=AdaBoost:AdaBoostBeta=0.5:SeparationType=GiniIndex:nCuts=20”);

The training was successful, how ever I got very strange ROC curves, and the signal efficiency is almost a constant 100%:
: Testing efficiency compared to training efficiency (overtraining check)
: -------------------------------------------------------------------------------------------------------------------
: DataSet MVA Signal efficiency: from test sample (from training sample)
: Name: Method: @B=0.01 @B=0.10 @B=0.30
: -------------------------------------------------------------------------------------------------------------------
: babarTMVA BDT : 1.000 (1.000) 1.000 (1.000) 1.000 (1.000)
: babarTMVA BDTF : 1.000 (1.000) 1.000 (1.000) 1.000 (1.000)
Anyone has idea, where I did wrong, or how can I tune them?
Thanks a lot in advance!

kialbert · August 15, 2019, 5:52pm

Hi,

Having perfect separation is usually an indication of an easy dataset.

I would suggest you start with checking you inputs using the TMVA GUI (see e.g. the TMVAClassification.C example). This can help you understand if you have any variable that is particularly easy for the method to use for discrimination.

Increasing the number of events in your input would be one way of “making it harder” if your current dataset is small; The full complexity might not be visible.

Cheers,
Kim

dlin · August 19, 2019, 10:13pm

Hi Kim,

Many thanks for the explanation.
You are right, my datasets might have variables very powerful to separate the background and signal
Thanks a lot!