How to specify different input files for training and testing events?

Dear TMVA users,

I was wondering if there was any simple way to specify two different files in TMVAClassification.C for both training and testing samples rather than having the algorithm pick them randomly from one single file.

The thing is that I would like to have the file containing the testing sample so I can run an independent analysis that is not connected to TMVA. Maybe there is already another easier way to do that?

Cheers,
Kévin

Hi,

You can add any number of trees to the data loading process. Either add it as a generic data source, so that TMVA will to the splitting internally:

TFile * file1 = TFile::Open(...);
TFile * file2 = TFile::Open(...);

dataloader->AddTree(file1->Get<TTree>("tree1"));
dataloader->AddTree(file2->Get<TTree>("tree2"));

Or add them specifically to the training/test set

TFile * file1 = TFile::Open(...);
TFile * file2 = TFile::Open(...);

dataloader->AddTree(file1->Get<TTree>("tree1"), "Signal", 1.0, "signal cut", TMVA::Types::kTraining);
dataloader->AddTree(file1->Get<TTree>("tree1"), "Background", 1.0, "bkg cut", TMVA::Types::kTraining);
dataloader->AddTree(file1->Get<TTree>("tree2"), "Signal", 1.0, "signal cut", TMVA::Types::kTesting);
dataloader->AddTree(file1->Get<TTree>("tree2"), "Background", 1.0, "bkg cut", TMVA::Types::kTesting);

Do note that the all variables added to the data loader must be present in all trees.

Cheers,
Kim

2 Likes

Thank you very much Kim, it worked like a charm :slight_smile: