Possible BDT ROC Curve Inaccuracy

Str55 · May 25, 2020, 3:57pm

Hi experts,

I’ve noticed some interesting behavior in the ROC curves produced by TMVA when training BDTs. As a check, I took the training points found in dataset/TrainTree and ran them through TMVA::ROCCurve::ROCCurve. I plotted this (red) alongside the ROC curve automatically produced by TMVA when training (blue) (Dataset/Method_BDT/BDT/MVA_BDT_trainingRejBvsS).

Since, to my understanding, these both use the same exact points to produce a ROC Curve, they should theoretically be perfectly overlapping. However, as can be seen in my figure above, below signal efficiencies of 0.5, they diverge.

I suspect this behavior to be a binning artifact, resulting from the fact that TMVA displays the automatically generated ROC curve as a TH1, while ROCCurve::ROCCurve outputs it as a TGraph.

My question is: which should I trust? Is there some good reason that the TH1 should be used in comparisons and area-under-the-curve calculations, or should I stick with the seemingly more precise TGraph.

Thank you.

moneta · May 26, 2020, 7:31am

Hi,

Yes this looks like a binning effect caused by the finite binning of using histograms when computing the ROC internally by TMVA (in Dataset/Method_BDT/BDT/MVA_BDT_trainingRejBvsS) especially when the number of data points is not very high.
The class ROCCurve uses to compute the ROC curve all vector of data points without binning so it is more accurate

Lorenzo