MVA variables correlations

Dear experts,
I always heard that we should care about the variables correlation in the TMVA. But if I used a lots of variables, such that there are correlations, but in the end I have a good separation and nice ROC curve, is there a good reason “other then time saving…”, that I should not use correlated variables? I mean will the result still make sense despite this correlations?

This question is vague. What kind of variables are you talking about? Could you please give a more specific example? I don’t think that it’s possible to give you a clear answer otherwise. Are you using a neural network with more neurons/layers than necessary to separate signal/background? It might be a problem due to overfitting, for example.

Dear Amadio,
if I use pt1, pt2 et pt(1+2), we expect pt(1+2) to be correlated with pt1 and pt2. Now I wonder if it is a problem to use the 3rd variables in the DNN despite its correlation with the 2 other one?


As discussed in this answer, in priciple not. However, if you introduce many variables with high correlation you can run into problems with training set size (many input variables implies a large input space and your problem can become under sampled).