Dear all TMVA experts,
We are using the BDT to separate the background and signal. And I saw in the parameter setup there is the
VarTransform=Decorrelation, PCA-transformation, Gaussianisation, Normalisation
Does this mean that the TMVA will preprocess the input variables, like PCA transformation will make those variables to be the integrated 1st principal component, 2nd component, etc to be the input variables to be used as the cuts in the BDT method?
If this is the case, how can we know which transformation should be used for which method?(e.g. as the BDT method is just the choice of cuts in the trees, it seems the transformation or normalization will not change the performance?)
And a naive question about the decorrelation, since the BDT is minimizing the global loss function, then I am wondering the correlation between the variables will not affect the performance of BDT?(since the correlation will certainly cause big variance in linear regression and make the prediction worse, but I am not sure whether it is still the issue in the non-linear regression methods like BDT)
You can read up the exact procedure in the users’s guide here (section 4).
From my knowledge, decorrelation and PCA-transformation are quite similar. They rotate the input space of your variables to a space with minimally correlated variables. Therefore, they are not expanding/reducing the number of inputs.
As you mentioned, the normalization should not have an effect on the BDT performance because the cuts of the BDT are not sensitive to the normalization. However, decorrelation could be feasible because you “simplify” the inputs for your BDT by decorrelating them. So the BDT does not have to learn higher-order connections between inputs but can perform better with simpler assumptions. But still, no information is lost by performing the decorrelation, you are only rotating your input space to a (probably) more suitable system.
Many thanks for the detailed explanation!
Yeah, it totally make sense to me now. For BDT methods may be fine, as this is only the cut and not related with the scale of the variables.
I have also check the user’s guide, it says the decorrelation only works for the linearly correlated variables, and if the decorrelation used in the real non-linear variable correlation, it may make the prediction less accurate(I guess it maybe suffered from reason as you said, but in the opposite direction, the change of the original non-linear correlation shape in high dimensions may make the learning harder, with newly transformed more complicated high dimension distribution).