TMVA PyKeras comparison with standalone Keras API

11133 · April 5, 2022, 11:34am

Hi all

I used TMVA PyKeras and trained NN with my data. Then I got the scores from my trained model for the signal and background data. If we look at the distribution
PyKeras_distribution
the signal samples massively tend to around 1 and background samples around 0, as it’s anticipated. When I try to build exactly the same NN model and train with exactly the same data, but using standalone Keras API and then try to get the scores from the trained model for the signal and background samples, we can see the following distribution

It’s seen that the model poorly distinguishes the signal and background samples from each other.

It’s interesting for me that although the ROC AUC I get using TMVA PyKeras is 0.79 and ROC AUC with standalone Keras API is 0.81, the model which trained using TMVA distinguishes signal/background better. Such distribution is important for my research since I obtain the values such as significance of the signal using this distribution.

I wonder if there is some additional hidden steps that TMVA automatically performs when I try to get the scores from already trained model?

Cheers,
Aizat

eguiraud · April 19, 2022, 11:41am

Hi @11133 ,
sorry for the high latency, we need @moneta 's help, let’s ping him.

Cheers,
Enrico

moneta · April 19, 2022, 1:34pm

Hi,
I think in the first case you have multiple background, in the second only one. Is all the background combined ? You should compare similar thing.
The ROC results seem to me quite compatible, so in general I would say the discrimination is similar. Anyway, if you still have an issue, please post all the code and data so we can reproduce it

Lorenzo