Dear TMVA experts,
We observe a curious feature when we updated TMVA from version 4.20 to the default in Root 6.14.04. In both cases, we keep everything constant - input variables,
signal and background. The only exception was the replacement of the
method “factory” by “dataloader”. This was because “factory” was
discontinued from use in later versions of TMVA. For the same input
signal and background, and for the exact same input variables, each of
the tested classifiers in 4.20 gives a different (best) significance
compared to the(best) significance obtained from the same classifier in
version 6.14.04. For 4.20, we use Root version 5.34.38 and for the other,
we use Root version 6.14.04. Has this been seen/resolved by anyone else?
We found a similar issue seen by someone else, but there seems to be no resolution, so posting here. https://sourceforge.net/p/tmva/mailman/message/35364243/
Thanks in advance for your attention and assistance!
To my knowledge there could be 2 changes affecting you (there could be more of which I am unaware).
The BDT with gradient boosting had a bug in how event weights were handled, skewing the output to disproportionally favor large event weights. This was only manifesting for event weights larger than 1, which is uncommon if the standard normalisation is used.
The MLP output evaluation function was changed, to model a probability instead of arbitray ”units”.
If neither of these explanations fit your case I’d love to hear details; E.g. what methods? Can the effects be replicated using a simplified setup (TMVAClassification?)?
Thanks a lot for the prompt response. The reasons may be true for BDT. Our input Monte Carlo simulated background (ttbar) did have significant fraction of events with weights larger than 1. But we also saw a similar effect for Fisher discriminant. A simple way to replicate the effects would be to run the two different versions with the same input signal and background and same set of input variables. I should be able to share with you some truth Monte Carlo (not the ATLAS simulated Monte Carlo we actually used) if you would like to test it.
The smaller we can make the example provoking the behaviour the better. Would it be possible for you to check if this can be replicated using e.g. the TMVAClassification tutorial.
The case that you see different behaviour on with the Fisher Discriminant is interesting.