ROC curve from the BDT and BDTF models

Dear TMVA experts,

I am learning to use the TMVA package, here is the models (BDT and BDTF) I used:
factory -> BookMethod(dataloader, Types::kBDT, “BDT”, “!H:!V:NTrees=850:MinNodeSize=2.5%:MaxDepth=4:BoostType=AdaBoost:AdaBoostBeta=0.5:UseBaggedBoost:BaggedSampleFraction=0.5:SeparationType=GiniIndex:nCuts=20”);

factory -> BookMethod(dataloader, Types::kBDT, “BDTF”, “!H:!V:NTrees=50:MinNodeSize=2.5%:UseFisherCuts:MaxDepth=3:BoostType=AdaBoost:AdaBoostBeta=0.5:SeparationType=GiniIndex:nCuts=20”);

The training was successful, how ever I got very strange ROC curves, and the signal efficiency is almost a constant 100%:
: Testing efficiency compared to training efficiency (overtraining check)
: -------------------------------------------------------------------------------------------------------------------
: DataSet MVA Signal efficiency: from test sample (from training sample)
: Name: Method: @B=0.01 @B=0.10 @B=0.30
: -------------------------------------------------------------------------------------------------------------------
: babarTMVA BDT : 1.000 (1.000) 1.000 (1.000) 1.000 (1.000)
: babarTMVA BDTF : 1.000 (1.000) 1.000 (1.000) 1.000 (1.000)
Anyone has idea, where I did wrong, or how can I tune them?
Thanks a lot in advance!

Hi,

Having perfect separation is usually an indication of an easy dataset.

I would suggest you start with checking you inputs using the TMVA GUI (see e.g. the TMVAClassification.C example). This can help you understand if you have any variable that is particularly easy for the method to use for discrimination.

Increasing the number of events in your input would be one way of “making it harder” if your current dataset is small; The full complexity might not be visible.

Cheers,
Kim

Hi Kim,

Many thanks for the explanation.
You are right, my datasets might have variables very powerful to separate the background and signal
Thanks a lot!