TMVA: Separately passing Train and Test Tree for Signal and Background

kormoranos · February 8, 2021, 5:23pm

Dear rooters,

I have an apparently silly question that is bothering me since two weeks.

I have an old code using TMVA where I add signal and background in the following way:

 fTMVAdataloader[iTMVA]->AddSignalTree(signalTree, signalWeight);
 fTMVAdataloader[iTMVA]->AddBackgroundTree(backgroundTree, backgroundWeight);

Now I would like to specifically separate Train and Test tree as:

 fTMVAdataloader[iTMVA]->AddSignalTree(signalTestTree, signalWeight, TMVA::Types::kTesting);
 fTMVAdataloader[iTMVA]->AddSignalTree(signalTrainTree, signalWeight, TMVA::Types::kTraining);
 fTMVAdataloader[iTMVA]->AddBackgroundTree(backgroundTestTree, backgroundWeight, TMVA::Types::kTesting);
 fTMVAdataloader[iTMVA]->AddBackgroundTree(backgroundTrainTree, backgroundWeight, TMVA::Types::kTraining);

In both cases, I am running them with the following options: “nTrain_Signal=0:nTrain_Background=0:SplitMode=Alternate:NormMode=NumEvents:!V”

In the first case, TMVA automatically uses even events for train and odd events for test.
In the second case, I manually divided the original two files in four final files using the same criteria.

If I check the variables used as spectator or for BDT, the output histograms are exactly the same in the two cases, confirming that I have correctly split the files in the second case.

However, the distribution of the BDT variable is different and I noticed that it changes if, in the second case, I change the “SplitMode” between “Alternate”, “Random” or “Block”.

Originally, I thought this flag has no meaning in the second case, but I have now realized that it does mean, but I do not understand which is the correct configuration.

My question is: if I want implement TMVA as in the second case but obtain exactly the same training as in the first case, which options or commands are needed?

Thank you!

kormoranos · February 10, 2021, 2:06pm

Maybe my question was too long, saying it in a simpler way:

in general we separately pass signal and background files and specify to TMVA how to divide each sample in a training and a test subsamples
instead, how can I change this approach, separately specifying four files, i.e. training-signal, test-signal, training-background and test-background?

eguiraud · February 10, 2021, 3:29pm

Hi @kormoranos ,
I think we need our TMVA expert @moneta , let’s ping him.

moneta · February 10, 2021, 4:42pm

Hi,
I would need to check your second approach. Can you please post your files and your code,
Thanks

Lorenzo