TMVA example with lots of variables - freezing

Hi,

I’m trying to learn how to use TMVA, and regret to be using it a bit as a black box at the moment. I took the standard TMVAClassification example code and modified it to load my signal and background trees, and specified all the variables I wanted it to try working with. I may have been a bit ambitious here, as I gave it 150 variables to look at. I was interested in just doing the naive bayes classifier to start with (‘Likelihood’ method). When I run the macro, things start ok, but then reach:

— TFHandler_Factory : Some more output
— Gauss : Preparing the Gaussian transformation…

At which point this did not progress for over 12 hours. Is there a way I should be configuring the factory to work with a very large number of variables?

Thanks,
Will

Hello Will,

in the optinos given to the factory (its constructor) you can specify a set of ‘transformations’ that will be calculated, applied and the resulting variable distributions plotted. Those transformations (you find in the
template code some new TMVA::Factory("…:Transformations=I,D,G,DG:…") …
these transformations are never needed for a particular classifier. If you want to use a particular transformation in a classifier, you have to specify this in the corresponding “BookMethod” command.

Now I guess that you only use Likelihood w/o transformation (at least that’s what I would suggest) and then, I guess that calculating the numerical transofrmation for the Gaussianisation somehow get’s ‘overfull’ with your
150 variables. Please try to remove the all but the "Transformations=I: " (the identity transformation w/o which you don’t get the variable distributions plotted) and le me know if the likelihood then works fine.

(if not, maybe first try with some less variables and watch the available memory … )

Cheers,

Helge

Hello Will,

in the optinos given to the factory (its constructor) you can specify a set of ‘transformations’ that will be calculated, applied and the resulting variable distributions plotted. Those transformations (you find in the
template code some new TMVA::Factory("…:Transformations=I,D,G,DG:…") …
these transformations are never needed for a particular classifier. If you want to use a particular transformation in a classifier, you have to specify this in the corresponding “BookMethod” command.

Now I guess that you only use Likelihood w/o transformation (at least that’s what I would suggest) and then, I guess that calculating the numerical transofrmation for the Gaussianisation somehow get’s ‘overfull’ with your
150 variables. Please try to remove the all but the "Transformations=I: " (the identity transformation w/o which you don’t get the variable distributions plotted) and le me know if the likelihood then works fine.

(if not, maybe first try with some less variables and watch the available memory … )

Cheers,

Helge

Thanks Helge,

I did eventually figure out on my own that it was something about that Transformations property that it was struggling with, but thanks for the confirmation. Good to know what the “I” option stands for, because until now I just took all the transformations out, so didn’t get the input variable distributions (but at least got the likelihood reference distributions, which basically amount to the same thing).

While I’m here, can you tell me if there’s a good way to very precisely control the binning of certain input variables? I know there is the NAvEvtPerBin option, which helps regulate the number of bins, but in some cases I would still like to be able to say what the binning is, because the input data I know will be discrete in a specific sort of way (or I also want to neglect outliers in the distributions - I know I can specify the min and max values, but it seems the reference distributions sort of ignored these if there were outlier values in my training samples… I just want the outliers to get pushed into the overflow bins)

Thanks,
Will