Background response does not go to zero

Dear experts,

please, may I ask for your kind help/explanation.
I’m using the TMVA to select some signal decays and suppress the combinatorial background. The TMVA response plot (see attached) shows a very long tail for the background distribution, moreover it looks like the background distribution seems to have a peaking structure at “1”. Ofcourse, the performance in general is still good, but the shape looks a bit surprising.

mlp_output

A simulated datasample is used as signal input for training and real experimental data (“right data sideband of mass distribution”) as background. I checked the background data for possible contributions, that could mimic the signal, but for the moment didn’t find any. I’m still investigating, what could create such an effect from the “physics” point of view, but please, may I ask, if there are any technical features that should also be checked?

If it helps, I’m running ROOT version v6.20.06 installed at lxplus with x86_64-centos7-gcc9-opt and using MLP method with following parameters:
( ROOT.TMVA.Types.kMLP, ‘MLP’, ‘H:!V:EstimatorType=CE:VarTransform=N:NCycles=200:HiddenLayers=N+3:TestRate=5:!UseRegulator’)

Thank you!

If the task is hard then the MLP might not be able to suppose background even for very high responses. Independently, what is the signal fraction that you’d expect in your background sample? Note that this is log scale - so the right-most bin has about 100x more signal than background!

Dear Axel,

thanks for the reply!
Yes, this is a log scale and indeed, the peaking background structure comprises ~1.5% of the whole background sample, but I’m still confused why it is peaking. My current estimate of fraction of possible contributions (from signal or signal-like background) is ~10^-5.

Please, may I ask, what do you mean by “hard task”, may be I can make it simpler?Also, asking collegues around, I learnt that some see similar picture with BDTG method also, so, probably it’s not quite a method-specific thing.

Hi,

It’s fairly common to have both signal and background peak at the “signal side” of the response, for cases where background mimics signal very well. A possible solution might be to find better discriminating parameters, for instance by inspecting the background events that seem to mimic signal so well.

Cheers, Axel.

Hello,

ok, many thanks for the explanation! Then I’ll try to investigate the “physics” part of the problem further.

Wish you a Merry Christmas and Happy New Year
Cheers