TMVA correlated variables

Dear experts,
I wonder if it a problem if we use correlate variables in the MVA? For example if I use pT(1), pT(2), and pT(1+2), we expect non negligible correlation, but if the separation btw signal and background in the end is better with these 3 variables (instead of using only 2), is there a reason why I should not use these 3 variables?
Regards

Hi,

It depends on the method. If you use a NN it should be no problem to use correlated variables, because they can exploit difference in the correlation between the two classes

Lorenzo

Dear Moneta,
could it be a problem for other method? I mean if variables are strongly correlated, if should just not bring more information, but I do not see what it should be a problem?
Regards

Hi,

As you say, adding a highly correlated variable should add very little new information, so one would expect similar performance for the two trainings. However, as this stack overflow answer discusses:

When adding more features the dimensionality of the fitting problem increases. If the input space is large in comparison to the number of data points in you training data it can be difficult for the method to find a good solution. (Curse of dimensionality.)

Cheers,
Kim

Dear Kialbert,

  • ok, but if I look at my outputs and results and everything look ok, nothing forbbid me to use correlated variables right?
    Regards

Hi,

No, nothing forbids you from doing that :slight_smile:

For example, many Kaggle competition winners use “feature engineering” extensively (at least up to 2016). Feature engineering entails adding combinations of lower level features to the input data.

Cheers,
Kim

Dear Kialbert,
ok, thank you for your answer.
Regards

Hi,

If a particular post answered your question, please mark it as “Solution”. This helps other users with similar questions :slight_smile:

Cheers,
Kim

Dear kialbert,
ok, I still wonder if the muticlassification exist for DNN?
Regards

Hi,

This should then be a new topic since it is a separate question from the initial post :slight_smile:

And yes, you can use multi class classification with the DNN! Check the example TMVAMulticlass.C in $ROOTSYS/tutorials/tmva!

Cheers,
Kim

Dear Kialbert,
for the "mark it as “Solution” I do not see any button for that?
Regards