How do MethodCategory work?

kialbert · November 30, 2017, 4:16pm

The problem lies in how MethodCategory currently does its data splitting for the categories.

TMVA divides the data into a training and a test set. MethodCategory does the same, but uses the original data for its own subdivision. There can thus, in general, be overlaps of the TMVA test set and the MethodCategory training set.

During training MethodCategory uses the per-category training data. However, during the testing it uses the global test set. If there is an overlap of these, inaccurate estimation of the generalisation follows.

This is definitely something we should fix. In the meanwhile it is possible to work around the issue if we ensure that the per-category training set and the global test set is non-overlapping.

This is possible by manually assigning train and test data which bypasses the internal splitting mechanism of the dataloader (and that of MethodCategory).

TMVA::DataLoader d {"dataset"};

d.AddSignalTree    ( sigTreeTraining, 1.0, TMVA::Types::kTraining );
d.AddBackgroundTree( bkgTreeTraining, 1.0, TMVA::Types::kTraining );

d.AddSignalTree    ( sigTreeTesting, 1.0, TMVA::Types::kTesting );
d.AddBackgroundTree( bkgTreeTesting, 1.0, TMVA::Types::kTesting );

Thanks for you help. I hope this is also helpful to you.

Cheers,
Kim

EDIT 2017-12-01: Changed workaround suggestion.
EDIT 2017-12-04: Updated code example for clarity.