TMVA training with multiple backgrounds and prepared training/test samples

mgoncerz · September 8, 2021, 8:46pm

Hello!

I’m trying to setup a training with signal and multiple weighted backgrounds, but I’m really confused with how the PrepareTrainingAndTestTree method should be called. I already have them split in testing and training ntuples.

Let’s assume for simplicity that they are signal_train, signal_test, background1_train, background1_test, background2_train and background2_test. I load them using:

dataloader->AddSignalTree(signal_train, 1, TMVA::Types::kTraining);
dataloader->AddSignalTree(signal_test, 1, TMVA::Types::kTesting);
dataloader->AddBackgroundTree(background1_train, 0.5, TMVA::Types::kTraining);
dataloader->AddBackgroundTree(background1_test, 0.5, TMVA::Types::kTesting);
dataloader->AddBackgroundTree(background2_train, 0.5, TMVA::Types::kTraining);
dataloader->AddBackgroundTree(background2_test, 0.5, TMVA::Types::kTesting);

How do I make sure that all of the samples are fully used and properly assigned to training/testing (without additional implicit splitting)? Would it simply be:

TCut mycuts = "";
TCut mycutb = "";  
dataloader->PrepareTrainingAndTestTree(mycuts, mycutb, "NormMode=NumEvents:!V");

?

Thanks!

mwilkins · September 9, 2021, 3:05pm

Hi, @mgoncerz,

If you want to discriminate multiple sources of background, you might consider a multiclass approach, where instead of training “Signal” and “Background” samples, you train a number of arbitrarily named samples. See this tutorial for training and this one for application. You can also consider what is done in this other tutorial for training multiple backgrounds.

Hope that helps.

mgoncerz · September 9, 2021, 8:58pm

Dear @mwilkins, thank you for suggestions. I think that the multiclass approach is a bit of an overkill for my case. In the end I am only interested in a simple BDT trained to discriminate signal against a combined background from multiple sources.

I can, of course, prepare such a combined background sample manually, but I’ve assumed that TMVA may have some way of handling that built in.

I would still like to prepare the training and testing sub-samples myself though, so that I can do the actual analysis outside of TMVA GUI. Unfortunately, I can’t really find any clear example of it being done.

What I’m mostly confused by is whether I should call PrepareTrainingAndTestTree() after I have added appropriate trees using TMVA::Types::kTraining and TMVA::Types::kTesting, as it seems to be doing the splitting as well.

mwilkins · September 9, 2021, 9:55pm

Hi, @mgoncerz,

Okay, I think I understand your question now. I’m actually not sure you need to call PrepareTrainingAndTestTree at all… Glancing at the source code, it looks like calling AddSignalTree and AddBackgroundTree, as you’ve done, already accomplish what you need, but I should say I am not an expert here. Perhaps try running your code without the PrepareTrainingAndTestTree line and see if it behaves as you expect.

jalopezg · September 9, 2021, 11:29pm

Hi @mgoncerz,

What I suggest is to invite @moneta to this topic; he might provide some hints regarding the issue.

Cheers,
J.

moneta · September 28, 2021, 3:38pm

Hi,

I think if you are doing yourself the splitting by adding the trees with the right training/testing flags you should not need to call PrepareTrainingAndTestTree. I have not tested this workflow, so in case it does not work, please let me know

Cheers

Lorenzo