TMVA multiclass configuration

Dear experts,

I am trying to do a multiclass classification with TMVA. I train with “BDTG” method. I have a signal and two backgrounds, totally 3 classes. The training can run, but I saw

"Background" still come out in the print out. Is it harmful or not?
                         : Dataset[dataset2021] : Class index : 3  name : Background
                         : Dataset[dataset2021] :     Background      -- number of events passed: 0      / sum of weights: 0
                         : Dataset[dataset2021] :     Background      -- efficiency             : -nan
                         : Dataset[dataset2021] :  you have opted for interpreting the requested number of training/testing events
                         :  to be the number of events AFTER your preselection cuts
                         :
                         : Dataset[dataset2021] :  you have opted for interpreting the requested number of training/testing events
                         :  to be the number of events AFTER your preselection cuts
                         :
                         : Dataset[dataset2021] :  you have opted for interpreting the requested number of training/testing events
                         :  to be the number of events AFTER your preselection cuts

I also noticed that several error messages about Smooth but I did not call it in my config. How can I avoid it?
Error in TH1F::Smooth: Smooth only supported for histograms with >= 3 bins. Nbins = 1

Thanks,
Wenyu

Hi @zhwenyu,

Welcome to the ROOT forum! Could you please provide a minimal excerpt of your code? Also, I am inviting @moneta to this topic, as he is the relevant expert in TMVA.

Cheers,
J.

Hi @jalopezg @moneta glad you can help!
I copied my code here since I am not able to upload file yet.

#!/usr/bin/env python
# @(#)root/tmva $Id$

# --------------------------------------------
# Standard python import
import os,sys  # exit
import time   # time accounting
import getopt # command line parser
import ROOT as r
import varsList

# --------------------------------------------
#weight and cut strings below are used for both background and signals!
weightStrC = "triggerXSF*pileupWeight*lepIdSF*EGammaGsfSF*isoSF*L1NonPrefiringProb_CommonCalc*MCWeight_MultiLepCalc/abs(MCWeight_MultiLepCalc)"
weightStrS = weightStrC+"*xsecEff"


#### NJetsCSVwithSF_MultiLepCalc
cutStrC = "(NJets_JetSubCalc >= 6 && NJetsCSV_MultiLepCalc >= 2) && ((leptonPt_MultiLepCalc > 20 && isElectron) || (leptonPt_MultiLepCalc > 20 && isMuon)) && (corr_met_MultiLepCalc > 60) && (MT_lepMet > 60) && (minDR_lepJet > 0.4) && (AK4HT > 500) && (DataPastTriggerX == 1) && (MCPastTriggerX == 1)"
cutStrS = cutStrC+" && ( isTraining == 1 || isTraining == 2 )"


# Default settings for command line arguments
DEFAULT_OUTFNAME = "weights/TMVA.root"
DEFAULT_INFNAME  = "180"
DEFAULT_TREESIG  = "TreeS"
DEFAULT_TREEBKG  = "TreeB"
DEFAULT_METHODS  = "BDT"
# "Cuts,CutsD,CutsPCA,CutsGA,CutsSA,Likelihood,LikelihoodD,LikelihoodPCA,LikelihoodKDE,LikelihoodMIX,PDERS,PDERSD,PDERSPCA,PDEFoam,PDEFoamBoost,KNN,LD,Fisher,FisherG,BoostedFisher,HMatrix,FDA_GA,FDA_SA,FDA_MC,FDA_MT,FDA_GAMT,FDA_MCMT,MLP,MLPBFGS,MLPBNN,CFMlpANN,TMlpANN,SVM,BDT,BDTD,BDTG,BDTB,BDTF,RuleFit"
DEFAULT_NTREES   = "100"
DEFAULT_MDEPTH   = "2"#str(len(varList))
DEFAULT_VARLISTKEY = "BigComb"
#print "Usage: python %s [options]" % sys.argv[2]
# Print usage help
def usage():
    print " "
    print "Usage: python %s [options]" % sys.argv[0]
    print "  -m | --methods    : gives methods to be run (default: all methods)"
    print "  -i | --inputfile  : name of input ROOT file (default: '%s')" % DEFAULT_INFNAME
    print "  -o | --outputfile : name of output ROOT file containing results (default: '%s')" % DEFAULT_OUTFNAME
    print "  -n | --nTrees : amount of trees for BDT study (default: '%s')" %DEFAULT_NTREES 
    print "  -d | --maxDepth : maximum depth for BDT study (default: '%s')" %DEFAULT_MDEPTH 
    print "  -l | --varListKey : BDT input variable list (default: '%s')" %DEFAULT_VARLISTKEY 
    print "  -t | --inputtrees : input ROOT Trees for signal and background (default: '%s %s')" \
          % (DEFAULT_TREESIG, DEFAULT_TREEBKG)
    print "  -v | --verbose"
    print "  -? | --usage      : print this help message"
    print "  -h | --help       : print this help message"
    print " "

# Main routine
def main():

    try:
        # retrive command line options
        shortopts  = "m:i:n:d:k:l:t:o:vh?"
        longopts   = ["methods=", "inputfile=", "nTrees=", "maxDepth=", "mass=", "varListKey=", "inputtrees=", "outputfile=", "verbose", "help", "usage"]
        opts, args = getopt.getopt( sys.argv[1:], shortopts, longopts )

    except getopt.GetoptError:
        # print help information and exit:
        print "ERROR: unknown options in argument %s" % sys.argv[1:]
        usage()
        sys.exit(1)

    infname     = DEFAULT_INFNAME
    treeNameSig = DEFAULT_TREESIG
    treeNameBkg = DEFAULT_TREEBKG
    outfname    = DEFAULT_OUTFNAME
    methods     = DEFAULT_METHODS
    nTrees      = DEFAULT_NTREES
    mDepth      = DEFAULT_MDEPTH
    varListKey  = DEFAULT_VARLISTKEY
    verbose     = True
    for o, a in opts:
        if o in ("-?", "-h", "--help", "--usage"):
            usage()
            sys.exit(0)
        elif o in ("-m", "--methods"):
            methods = a
        elif o in ("-d", "--maxDepth"):
        	mDepth = a
        elif o in ("-l", "--varListKey"):
        	varListKey = a
        elif o in ("-i", "--inputfile"):
            infname = a
        elif o in ("-n", "--nTrees"):
            nTrees = a
        elif o in ("-o", "--outputfile"):
            outfname = a
        elif o in ("-t", "--inputtrees"):
            a.strip()
            trees = a.rsplit( ' ' )
            trees.sort()
            trees.reverse()
            if len(trees)-trees.count('') != 2:
                print "ERROR: need to give two trees (each one for signal and background)"
                print trees
                sys.exit(1)
            treeNameSig = trees[0]
            treeNameBkg = trees[1]
        elif o in ("-v", "--verbose"):
            verbose = True

    varList = varsList.varList[varListKey]
    nVars = str(len(varList))+'vars'
    note = '_6j_year2016_NJetsCSV_multi'
    Note=methods+'_'+varListKey+'_'+nVars+'_mDepth'+mDepth+note
    outfname = "dataset2021/weights/TMVA_"+Note+".root"
    # Print methods
    mlist = methods.replace(' ',',').split(',')
    print "=== TMVAClassification: use method(s)..."
    for m in mlist:
        if m.strip() != '':
            print "=== - <%s>" % m.strip()
			
    # Import ROOT classes
    from ROOT import gSystem, gROOT, gApplication, TFile, TTree, TCut
    
    # check ROOT version, give alarm if 5.18 
    if gROOT.GetVersionCode() >= 332288 and gROOT.GetVersionCode() < 332544:
        print "*** You are running ROOT version 5.18, which has problems in PyROOT such that TMVA"
        print "*** does not run properly (function calls with enums in the argument are ignored)."
        print "*** Solution: either use CINT or a C++ compiled version (see TMVA/macros or TMVA/examples),"
        print "*** or use another ROOT version (e.g., ROOT 5.19)."
        sys.exit(1)
        
    # Import TMVA classes from ROOT
    from ROOT import TMVA

    # Output file
    outputFile = TFile( outfname, 'RECREATE' )
    

    factory = TMVA.Factory( "TMVAMulticlass", outputFile,
                             "!V:!Silent:Color:DrawProgressBar:Transformations=I;D;P;G,D:AnalysisType=multiclass" )
    loader = TMVA.DataLoader("dataset2021")
    
    (TMVA.gConfig().GetIONames()).fWeightFileDir = "weights/"+Note

    for iVar in varList:
        if iVar[0]=='NJets_JetSubCalc': loader.AddVariable(iVar[0],iVar[1],iVar[2],'I')
        else: loader.AddVariable(iVar[0],iVar[1],iVar[2],'F')

    inputDir = varsList.inputDir
    infname = "TTTT_TuneCP5_PSweights_13TeV-amcatnlo-pythia8_correctnPartonsInBorn_hadd.root" # 2016 2017
    iFileSig = TFile.Open(inputDir+infname)
    sigTree = iFileSig.Get("ljmet")

    loader.AddTree(sigTree, "Signal")

    ## multiple bkg classes
    bkg1File = TFile.Open(inputDir+ varsList.bkg1)
    bkg1Tree = bkg1File.Get("ljmet")
    bkg2File = TFile.Open(inputDir+ varsList.bkg2)
    bkg2Tree = bkg2File.Get("ljmet")

    loader.AddTree(bkg1Tree, "ttbar")
    loader.AddTree(bkg2Tree, "ttH")


    loader.SetWeightExpression( weightStrS )

    mycutSig = TCut( cutStrS )

    loader.PrepareTrainingAndTestTree( mycutSig,
                                        "SplitMode=Random:NormMode=NumEvents:!V" ) # nEvents 


# bdtSetting for "BDTG" 
    bdtGSetting = '!H:!V:NTrees=%s:MaxDepth=%s' %(nTrees,mDepth)
    bdtGSetting += ':MinNodeSize=2.5%:BoostType=Grad:Shrinkage=0.10:UseBaggedBoost:BaggedSampleFraction=0.5:nCuts=20'
    bdtGSetting += ':Pray' #Pray takes into account the effect of negative bins in BDTG
    #bdtGSetting += ':IgnoreNegWeightsInTraining=True'
#Note also that explicitly setting *nEventsMin* so far OVERWRITES the option recomeded ^[[0m


#BOOKING AN ALGORITHM
    if methods=="BDT": factory.BookMethod( TMVA.Types.kBDT, "BDT",bdtSetting)
    if methods=="BDT": factory.BookMethod( loader, TMVA.Types.kBDT, "BDT",bdtSetting)    
    if methods=="BDTG": factory.BookMethod( loader, TMVA.Types.kBDT, "BDTG",bdtGSetting)
    if methods=="BDTMitFisher": factory.BookMethod( TMVA.Types.kBDT, "BDTMitFisher",bdtFSetting)
    if methods=="BDTB": factory.BookMethod( TMVA.Types.kBDT, "BDTB",bdtBSetting)
    if methods=="BDTD": factory.BookMethod( TMVA.Types.kBDT, "BDTD",bdtDSetting)
    # --------------------------------------------------------------------------------------------------
           

    # Train MVAs
    print "train all method"
    factory.TrainAllMethods()

    print "test all method"
    # Test MVAs
    factory.TestAllMethods()
    
    # Evaluate MVAs
    factory.EvaluateAllMethods()    

    # Save the output.
    outputFile.Close()

    if not gROOT.IsBatch(): TMVA.TMVAGui( outfname )
    print "DONE"

if __name__ == "__main__":
    main()

Hi @zhwenyu,

This line is almost duplicated. Is that intended? For the rest, we will have to wait for @moneta to reply.

Cheers,
J.

Thanks. I removed it. I actually resolved this problem by modifying the SetWeightExpression. And then I added the same method for all the bkgs.

loader.SetWeightExpression( weightStrS , "Signal")

After this modification ths training can run ok.

I have another problem coming about the cuts. I know we can set event cut in Addtree for different cuts in different classes and in PrepareTrainingAndTestTree as pre-selection. When I put a cut in AddTree loader.AddTree(sigTree, "Signal", 1., TCut("NJets_JetSubCalc >= 6"))
I met error

Error in <TBranchElement::TBranch::WriteBasketImpl>: basket's WriteBuffer failed.

Error in <TBranchElement::TBranch::Fill>: Failed to write out basket.

Error in <TBranchElement::Fill>: Failed filling branch:deltaPhi_METjets, nbytes=-1
Error in <TTree::Fill>: Failed filling branch:ljmet.deltaPhi_METjets, nbytes=-1, entry=372
 This error is symptomatic of a Tree created as a memory-resident Tree
 Instead of doing:
    TTree *T = new TTree(...)
    TFile *f = new TFile(...)
 you should do:
    TFile *f = new TFile(...)
    TTree *T = new TTree(...)


Do you understand why it occurs? I want to mention, before I add weight and cut in this method, no errors showed.

Ping @moneta :slight_smile: