TMVA cross validation over training check and K-S test

Yao_Yao · October 25, 2022, 8:34pm

Dear experts,

I am using a 10-fold cross validation to train and test a signal vs background analysis, and I am looking into the over-training plot in the TMVAGui, that has the training and testing events plotted on the same plot. The plot is attached.

I don’t think the plot that the TMVAGui provides demonstrates the over-training situation, because the BDT score distribution of testing and training are exactly the same. It seems to me that because it is a 10-fold cross validation, all events are counted as both trained and tested. The K-S test value on the plot is 1 for both signal and background, which implies that it is plotting the same dataset for training and testing.

Is there another way to evaluate the over-training for cross-validation in TMVA package?

Thank you,
Yao

eguiraud · November 1, 2022, 4:36pm

Hi @Yao_Yao ,

sorry for the high latency. Let’s ask TMVA expert @moneta .

Cheers,
Enrico

Yao_Yao · November 1, 2022, 5:11pm

Hi,

I think I find a solution here. So far I split the dataset into half training and half test in the dataloader, and the CV is only trained, tested and evaluated on the training part of the dataset. I use the other half to plot the test result. I am currently plotting it with my own plotting script. Please let me know if you have any comments.

Thank you,
Yao