I am using a 10-fold cross validation to train and test a signal vs background analysis, and I am looking into the over-training plot in the TMVAGui, that has the training and testing events plotted on the same plot. The plot is attached.
I don’t think the plot that the TMVAGui provides demonstrates the over-training situation, because the BDT score distribution of testing and training are exactly the same. It seems to me that because it is a 10-fold cross validation, all events are counted as both trained and tested. The K-S test value on the plot is 1 for both signal and background, which implies that it is plotting the same dataset for training and testing.
Is there another way to evaluate the over-training for cross-validation in TMVA package?
I think I find a solution here. So far I split the dataset into half training and half test in the dataloader, and the CV is only trained, tested and evaluated on the training part of the dataset. I use the other half to plot the test result. I am currently plotting it with my own plotting script. Please let me know if you have any comments.