I have a few questions regarding the cross validation.
- The suggestions on cross validation is to use
SplitType=Deterministic if I want to work with
TMVACrossValidationApplication. See comments in this link. Does this mean I should always make use of
SplitType=Deterministic ? Or I can just take cross validation as a cross check training and apply the output from
- If I choose to cross validation with
SplitType=Deterministic , does this mean mixing way between signal and background is deterministic? There are two options
MixMode in the
TMVA::Factory. I did not find
MixMode in the TMVA user guide for
TMVACrossValidation. From the TMVA user guide, I guess we need signal and background mixing randomly for BDT and MLP (correct me if I am wrong), is it possible in
TMVA::kDL, it seems it will ignore the input tree type. I tried to add two kinds of trees (see below), and it will combine these two and choose 20% for testing. Is this expected? And how could I split into different fractions? And is it possible to use cross validation as well for DL?
dataloader->AddSignalTree("tree1", 1.0, "Training");
dataloader->AddSignalTree("tree2", 1.0, "Test");
Many thanks in advance!
Hi @Y.S.Zhang ,
I think we need @moneta 's help here
My understanding is that the
SplitType=Deterministic should be used for Cross-evaluation, a special case of cross-validation. For stand one you should use Random splitting
For TMVA DL, the data is split in train and test sample as in case of BDT, but the corresponding training data is again split in a new training (80% by default) and a validation sample (20%) used to validate the convergence. It should be possible to use cross validation, but keeping in mind of this additional split
I hope I have answered your questions
Thanks for your classification! I still have a couple of question.
- Is it possible to work with standard cross-validation and apply the output using TMVA?
- For cross-evaluation
SplitType=Deterministic, does it mean
SplitType divide the signal and background into k-folds and signal and background are mixed following
MixMode in each fold?
By saying “apply the output using MVA”, I was trying to ask if I want to apply the standard cross-validation (I suppose I need to set
SplitType=Random), can I find a way to apply the output of training to the data sample? Right now as far as I understand, I can apply cross-evaluation to data samples.
The output of the Crossvalidation training will be an average model obtained from the different folds, see the average results from the tutorial, and it can then be used on a real data sample.