Questions about ROC curves (beginner with TMVA)

Daniel_Zheng · July 8, 2017, 1:31pm

If someone could clear this up it would be greatly appreciated! I am just having some confusion interpreting TMVA ROC curves and have not been able to find a clear answer.

Say if we are plotting a ROC curve for a BDT, the response usually ranges from -1 to +1.

So the signal efficiency is calculated at various thresholds along -1 to 1, by taking the number of signal events above each threshold and dividing by the total number of signal events. This is the true positive rate. Similarly, the false positive rate is the number of background events above each of these thresholds divided by total number of background events.

Now plotting TPR vs FPR should result in a plot like this which is what I expected from TMVAGUI ROC curves.

However, TMVA’s plots look like this, leading me to believe that they are plotting 1-FPR vs TPR. This means that background rejection should be 1-FPR, or the number of events below each threshold mentioned before, and background efficiency is FPR.

Under these assumptions think I was able to reproduce TMVAGUI’s plots with pandas and sci-kit learn.
I would just like to confirm this is correct, as I need to produce similar plots with python libraries outside of TMVA/ROOT.

One other thing, is there a simple way to get the AUC for a ROC produced by TMVAGUI? I know there is one line that says ROC integ. or something in the output of TMVA after testing an ML method, but when looking at old output files I have had to use root_numpy to load the output root file into pandas and calculate the AUC with sci-kit.

Thanks.

kialbert · July 10, 2017, 4:32pm

Hi,

Welcome to root-forum and to TMVA! Yes it is indeed as you say, the background rejection that is plotted in tmvagui corresponds to (1 - fpr, or equiv. 1 - back. eff.)

There is currently no simple way to extract the AUC from TMVAGUI. There is, as you say, the output from the training and testing run and you could either: record these or, re-run your training script without the TrainAllMethods call (replacing it with loading the pretrained models from disk).

Having easy access to the AUC from tmva gui should of course be possible, thanks for requesting the feature! We will look into adding it as soon as possible.

Alessio_Gianelle · July 20, 2017, 2:00pm

Hi, I have a similar problem, can you explain how can I replace the TrainAllMethods call with the “loading the pretrained models from file” so that I can rerun factory.TestAllMethods() and factory.EvaluateAllMethods() ?
(if possible using python)
Many thanks,
ale

kialbert · July 20, 2017, 2:41pm

Hi, sure! But please create a new post in the future for this kind of continued discussion

After further consideration quickly becomes very hacky to take this approach. I think the simpler would be to manually calculate the roc integral using the ROCCurve class by running the application phase (look at TMVAClassificationApplication e.g.) on the test set and putting the output and class in two vectors.

// Create TMVA::Reader and add methods from weight file
// Read in data from tree
// For event in tree
//   apply method
//   store output, class (sig/bkg) and weight into vectors
// End for
TMVA::ROCCurve roc {outputs, classes, weights};
std::cout << roc.GetROCIntegral() << std::endl;