Cross validation with BDT, MLP and DL

Dear experts,

I have a few questions regarding the cross validation.

  1. The suggestions on cross validation is to use SplitType=Deterministic if I want to work with TMVACrossValidationApplication. See comments in this link. Does this mean I should always make use of SplitType=Deterministic ? Or I can just take cross validation as a cross check training and apply the output from TMVAClassification?
  2. If I choose to cross validation with SplitType=Deterministic , does this mean mixing way between signal and background is deterministic? There are two options SplitMode and MixMode in the TMVA::Factory. I did not find MixMode in the TMVA user guide for TMVACrossValidation. From the TMVA user guide, I guess we need signal and background mixing randomly for BDT and MLP (correct me if I am wrong), is it possible in TMVACrossValidation using SplitType=Deterministic?
  3. For TMVA::kDL, it seems it will ignore the input tree type. I tried to add two kinds of trees (see below), and it will combine these two and choose 20% for testing. Is this expected? And how could I split into different fractions? And is it possible to use cross validation as well for DL?
dataloader->AddSignalTree("tree1",  1.0, "Training");
dataloader->AddSignalTree("tree2",  1.0, "Test");

Many thanks in advance!

Best regards

Hi @Y.S.Zhang ,
I think we need @moneta 's help here :grinning_face_with_smiling_eyes:

Cheers,
Enrico

Hi,
My understanding is that the SplitType=Deterministic should be used for Cross-evaluation, a special case of cross-validation. For stand one you should use Random splitting

For TMVA DL, the data is split in train and test sample as in case of BDT, but the corresponding training data is again split in a new training (80% by default) and a validation sample (20%) used to validate the convergence. It should be possible to use cross validation, but keeping in mind of this additional split

I hope I have answered your questions

Best regards

Lorenzo

Thanks for your classification! I still have a couple of question.

  1. Is it possible to work with standard cross-validation and apply the output using TMVA?
  2. For cross-evaluation SplitType=Deterministic, does it mean SplitType divide the signal and background into k-folds and signal and background are mixed following SplitMode and MixMode in each fold?
  1. I am not sure what do you mean apply the output using TMVA.
  2. I think this is the case, but I would need to test it to be 100% sure. The SplitType option in Dataloader::PrepareTrainingAndTestTree will split the data sample in training and test sample, mixing signal and background events. After that these data will be splits in k-folds according to the CV criteria.

Lorenzo

By saying “apply the output using MVA”, I was trying to ask if I want to apply the standard cross-validation (I suppose I need to set SplitType=Random), can I find a way to apply the output of training to the data sample? Right now as far as I understand, I can apply cross-evaluation to data samples.

Hi,
The output of the Crossvalidation training will be an average model obtained from the different folds, see the average results from the tutorial, and it can then be used on a real data sample.

Lorenzo