TMVAMulticlass Specifying training / testing split

Hello,

I would like to change the percentage of data used in training multiclass BDT’s, but cannot find documentation of the correct string to use. I see usually the following is used:

PrepareTrainingAndTestTree( MyCut, “NormMode=NumEvents:!V” );

Which defaults to a 50/50 split. I would like to, for example, use 95% of my data to train, and 5% to test.

For further clarification, Lets say I had 10000 Signal and Bkg events. In typical binary class decision trees, I could specify:

PrepareTrainingAndTestTree( MyCut, “nTrain_Signal=9500:nTrain_Background=9500:…”).

However, for multiclass BDT, I get the error:
“ : The following options were specified, but could not be interpreted: ‘nTrain_Background=9500:nTest_Background==500’, please check!”

And my next best guess, “nTrain”, was not accepted.

Thanks!

Did you try the overload that takes the number of entries as arguments? DataLoader::PrepareTrainingAndTestTree ( const TCut & cut, Int_t NsigTrain, Int_t NbkgTrain, Int_t NsigTest, Int_t NbkgTest, const TString & otherOpt = "SplitMode=Random:!V" )

Did you try with nTrain_Background=9500:nTest_Background=500 instead of nTrain_Background=9500:nTest_Background==500 (i.e. =, not ==)?

Hello, thank you for the reply.

I have tried these. I get the following error when trying the first suggestion:

: The following options were specified, but could not be interpreted: ‘nTrain_Background=862117:nTest_Background=45375’, please check!

As a reminder, I would like to train a 3 output class BDT, with 1 signal class and two background classes Bkg0 and Bkg1. I see an immediate problem with the first suggested option, as it only specifies a " NbkgTrain", and no “Nbkg1Train”.

I have also tried this overload: PrepareTrainingAndTestTree(const TCut & cut, Int_t Ntrain, Int_t Ntest = -1) to no avail.

@moneta could you have a look, please?

Hi,
Can you try using the name of the classes you have defined previously, since you have multiple backgrounds. For example:

dataloader->AddTree    (signalTree,"Signal");
dataloader->AddTree    (background0,"bg1");
dataloader->AddTree    (background1,"bg2");
dataLoader->PrepareTrainingAndTestTree(myCut,  "nTrain_Signal=1000:nTrain_bg1=1000:nTrain_bg2=1000:.....");

Best,

Lorenzo

Hello,

This seems to work! thank you. Is there documentation of this that I missed?

Cheers,

Matt

Hello,

Unfortunately not, it is not mention in the table 22 of the TMVA Users Guide. We should add a note for the multi class case. Thank you for raising this issue,

Cheers

Lorenzo

1 Like