TMVA: More than 200 variables, will not calculate PCA!

Hello ROOT experts,

I am currently working with TMVA for a regression analysis with a substantial number of variables. I’d like to perform a Principal Component Analysis (PCA) to account for potential correlations among these variables, using the option VarTransform=P.
Regrettably, I encountered this message:

Factory                  : Train method: LD for Regression
                         : 
                         : Preparing the Principle Component (PCA) transformation...
                         : ----------------------------------------------------------------------------
                         : : More than 200 variables, will not calculate PCA!
                         : ----------------------------------------------------------------------------

Then the process results in a segmentation violation error.

I read that the transformation is facilitated through the TPrincipal library, which typically manages a significant volume of variables without issue. I am curious if there exists a global configuration option in TMVA that can be adjusted to circumvent this warning and enable the PCA calculation despite the large number of variables.

Many thanks.

Hi @zenith378; maybe @moneta can help you with this.

Cheers,
J.

hi @zenith378,

Looking at the TMVA code I see several references that limit the number of variables to 200.

For instance in tmva/tmva/src hard-coded:

grep "More than " *

VariableDecorrTransform.cxx:            << ": More than 200 variables, will not calculate decorrelation matrix "
VariableGaussTransform.cxx:            << ": More than 200 variables, I hope you have enough memory!!!!" << Endl;
VariablePCATransform.cxx:            << ": More than 200 variables, will not calculate PCA!" << Endl;

But in Config.cxx an attempt is made to parametrize it:

fVariablePlotting.fMaxNumOfAllowedVariables = 200;

So although TPrincipal does not have this limitation, adapting TMVA does not seem trivial.

-Eddy

Hi,
As Eddy correctly noticed, 200 is an hard-coded limit inside the VariablePCATransform.cxx file. We can probably change this and make it configurable in TMVA, but for the time being the only way is that you edit the file and re-compile ROOT

Lorenzo