If I do a multiclass analysis with bagging enabled, the BDT response histogram has all entries assigned to the overflow bin. Effectively, this means I cannot evaluate my BDT response in a multiclass analysis with bagging. I have no such problem with a standard classification analysis.
Reproducer:
import ROOT as r
r.RDataFrame(1000).Define("v", "gRandom->Gaus(5, 5)").Define("u", "gRandom->Landau(5, 5)").Define("e", "rdfentry_").Snapshot("atree", "sig.root")
r.RDataFrame(1000).Define("v", "gRandom->Gaus(1, 3)").Define("u", "gRandom->Landau(1, 3)").Define("e", "rdfentry_").Snapshot("atree", "bkg.root")
r.RDataFrame(1000).Define("v", "gRandom->Gaus(-1, 10)").Define("u", "gRandom->Landau(-1, 10)").Define("e", "rdfentry_").Snapshot("atree", "oth.root")
fsig = r.TFile.Open("sig.root")
tsig = fsig.atree
fbkg = r.TFile.Open("bkg.root")
tbkg = fbkg.atree
foth = r.TFile.Open("oth.root")
toth = foth.atree
fout = r.TFile.Open("out.root", "recreate")
dl = r.TMVA.DataLoader("dataset")
dl.AddVariable("v", "Gaussian distribution", "", "F")
dl.AddVariable("u", "Landau distribution", "", "F")
dl.AddSpectator("e", "entry number", "")
dl.AddTree(tsig, "sig")
dl.AddTree(tbkg, "bkg")
dl.AddTree(toth, "oth")
dl.PrepareTrainingAndTestTree("", r"nTest_sig=0:nTest_bkg=0:nTest_oth=0:NormMode=NumEvents:!V:SplitSeed=100:SplitMode=Random")
fact = r.TMVA.Factory("TMVAClassification", fout, r"!V:!Silent:AnalysisType=Multiclass")
fact.BookMethod(
dl,
r.TMVA.Types.kBDT,
"BDT",
r"!H:!V:nTrees=500:BoostType=Grad"
r":UseBaggedGrad" # problem
)
fact.TrainAllMethods()
fact.TestAllMethods()
fact.EvaluateAllMethods()
fout.Close()
f = r.TFile.Open("out.root")
h = f.dataset.Method_BDT.BDT.MVA_BDT_Test_sig_prob_for_sig
overflowbin = h.GetNbinsX() + 1
if h.GetEntries() == h.GetBinContent(overflowbin):
print("BDT response assigned to overflow")
else:
print("BDT response assigned properly")
Th BDT response is assigned properly after commenting/removing the line marked # problem
.