TMVAGui mislabels multiclass ROC curves

ROC curves plotted using TMVAGui are labeled as the wrong classes. This results in the user attributing an MVA’s behavior to the wrong class and means that any analysis that relied on these curves may have had undiagnosed problems.

I found this in ROOT 6.22/06.9


Reproducer:

import ROOT as r

r.RDataFrame(1000).Define("v", "gRandom->Gaus(5, 5)").Define("u", "gRandom->Landau(5, 5)").Define("e", "rdfentry_").Snapshot("atree", "sig.root")
r.RDataFrame(1000).Define("v", "gRandom->Gaus(1, 3)").Define("u", "gRandom->Landau(1, 3)").Define("e", "rdfentry_").Snapshot("atree", "bkg.root")
r.RDataFrame(1000).Define("v", "gRandom->Gaus(-1, 10)").Define("u", "gRandom->Landau(-1, 10)").Define("e", "rdfentry_").Snapshot("atree", "oth.root")
fsig = r.TFile.Open("sig.root")
tsig = fsig.atree
fbkg = r.TFile.Open("bkg.root")
tbkg = fbkg.atree
foth = r.TFile.Open("oth.root")
toth = foth.atree
fout = r.TFile.Open("out.root", "recreate")

dl = r.TMVA.DataLoader("dataset")
dl.AddVariable("v", "Gaussian distribution", "", "F")
dl.AddVariable("u", "Landau distribution", "", "F")
dl.AddSpectator("e", "entry number", "")
dl.AddTree(tsig, "sig")
dl.AddTree(tbkg, "bkg")
dl.AddTree(toth, "oth")

dl.PrepareTrainingAndTestTree("", r"nTest_sig=0:nTest_bkg=0:nTest_oth=0:NormMode=NumEvents:!V:SplitSeed=100:SplitMode=Random")
fact = r.TMVA.Factory("TMVAClassification", fout, r"!V:!Silent:AnalysisType=Multiclass")
fact.BookMethod(
    dl,
    r.TMVA.Types.kBDT,
    "BDT",
    r"!H:!V:nTrees=500:BoostType=Grad"
    r":UseBaggedGrad"
)
fact.TrainAllMethods()
fact.TestAllMethods()
fact.EvaluateAllMethods()

fout.Close()

Then, launch the GUI:

root -l -e 'TMVA::TMVAMultiClassGui("out.root")'

and select (5) Classifier Backgr. Rej. vs Sig. Eff. (1-vs-rest ROC curves). Right click any curve shown and notice its name does not match the title of the canvas:

In this example, the ROC curve for class “sig” is shown (MVA_BDT_Test_rejBvsS_sig), but the canvas claims it’s the one for class “oth” (MVA_BDT_Test_rejBvsS_oth). Similarly for the other classes.


Discussion:
This only occurs for the ROC curves, not the Input Variables, etc., and it happens for both “1 vs 1” and “1 vs rest”. It appears to happen in a predictable manner, that is, each class is systematically mislabeled as the previous one in the file; in this case, that means “sig” is labeled as “oth”, “bkg” is labeled as “sig”, and “oth” is labeled as “bkg”. This all suggests that efficienciesMulticlass.cxx is somehow mixing up the labels, but I cannot find the problem in a quick look and don’t have a good way to debug it right now.

Hi,

Thank you for reporting this. There were some fixes applied in the MultiClass GUI for 6.24, but after trying your example, it looks like this problem is still there. I will investigate it

Cheers

Lorenzo

1 Like