Input variables decorrelation and building time

mcalvo · June 21, 2019, 7:39am

Hello,
I’ve found a ‘feature’ or issue when applying a decorrelation to the input variables of a TMVA MLP. If I set the “VarTransform” to “G,P,N” or “G,D,N” the size of the generated TMVA_MLP.C file increases to around 1.3 MB (compared to 20 kB when only a normalization is applied).
The issue comes when I include this generated file in a C++ algorithm of the LHCb experiment software (following the TMVA User’s Guide, with the IClassifierReader, etc.). The building time of the algorithm increases from seconds to half an hour, which is rather annoying. I wonder if there is some workaround for this. By the time being, we simply avoid to apply any decorrelation.

kialbert · June 21, 2019, 2:14pm

Hi,

The increased file size is due to the gaussian transformation. Essentially it bins the output space to model the distribution and stores the resulting histogram(s) in the .class.C file.

However, this does not explain the increased compilation time. I generated a quick test from the TMVAClassification tutorial and compiled the resulting .class.C file stand-alone with a main similar to this:

int main() {
	std::vector<std::string> vars = {"var1+var2", "var1-var2", "var3", "var4"};
	ReadMLP mlp{vars};

	std::cout << mlp.GetMvaValue({1, 2, 3, 4}) << std::endl;

	return 0;
}

and got quick compilation times. Could you try this on you end? This I think would shine some light whether the problem is related to TMVA itself or the integration in the build system.

Cheers,
Kim