Problem with TFractionFitter

Hello all,

I am using TFractionfitter class of root. I have a histogram from data from which I am
trying to estimate the fraction of real photons and fraction of fake ones.
The two templates I have are a signal photon template and fake photon template.
Background histogram has ~289 entries, signal template has ~4*10^6 entries and data histogram
has ~3300 entries.
I am not able to fit it using TFractionFitter as it gives the following fit error:

MINUIT WARNING IN MIGRAD ============== MATRIX FORCED POS-DEF BY ADDING 0.005887 TO DIAGONAL. MIGRAD FAILS TO FIND IMPROVEMENT MACHINE ACCURACY LIMITS FURTHER IMPROVEMENT. MIGRAD MINIMIZATION HAS CONVERGED. MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX. COVARIANCE MATRIX CALCULATED SUCCESSFULLY MIGRAD TERMINATED WITHOUT CONVERGENCE. FCN=-5.68298e+07 FROM MIGRAD STATUS=FAILED 118 CALLS 266 TOTAL EDM=0.00203654 STRATEGY= 1 ERR MATRIX APPROXIMATE EXT PARAMETER APPROXIMATE STEP FIRST NO. NAME VALUE ERROR SIZE DERIVATIVE 1 frac0 9.60852e-01 3.76508e-02 5.00000e-01 -2.30640e-01 2 frac1 3.91587e-02 2.73670e-02 5.48456e-02 1.45610e-02 ERR DEF= 0.5 EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 2 ERR DEF=0.5 1.436e-03 -1.726e-04 -1.726e-04 7.490e-04 ERR MATRIX APPROXIMATE PARAMETER CORRELATION COEFFICIENTS NO. GLOBAL 1 2 1 0.16645 1.000 -0.166 2 0.16645 -0.166 1.000 ERR MATRIX APPROXIMATE Error in <TFractionFitter::ComputeChisquareLambda>: Fit not yet (successfully) performed fit status: 4

How would you suggest to deal with such situations?

Please let me know if more on this is required from my side.

Thanks,
Best Regards,
Shilpi

Hello,

Some more information:
I tried using the smooth function of TH1. I checked that if I smooth it > 5 times, it is fitting.
If I smooth 6 or 7 times, then i get a negative fraction for real photons.
Beyond that it is always positive.
How do I check that smoothing is not biasing the fractions in any way?

Thanks,
Shilpi

Hi,
Can you please posts your histograms and your code reproducing your problem so I can have a look into it

Lorenzo

Hi Lorenzo,

Thanks for your reply. I have put all setup in forDebug.tar
Please find it attached.
It is all setup to produce the result. If you do ./plot.exe, it will produce the result.

Main script is plot_EB.C. It takes two input root files:
(1) file_dR0.300000.root - contains data and background sigma-ieta-ieta template for various
pt bins and HLT’s
(2) nopixelphojet.root - contains signal histograms

It reads xvariables.list. It can be compiled by running ‘compile’ script.

Please let me know if you could reproduce the results or if more information is required from my side.

Many Thanks,
Shilpi
forDebug.tar (260 KB)

Hello all,

I also see that the fraction is sometimes reported negative.
Is this fine? I understood that since this is minimisation of Log likelihood,
it can give even negative results (whatever minimises the LL).

Do you think this is fine?

Thanks,
Shilpi

Hi,

I think your problem is due to the low statistics in the bins for some of your template histograms which make the fit unstable, due to their large bin uncertainties. If you run your macro a second time, you see that the fit converges. The fact that sometimes you get negative fraction is another indication of the instability of the result. I would suggest you to try with more statistics or using a larger bin size,

Regards

Lorenzo

Thanks Lorenzo for your reply. I will try the same and see if the problem goes away.

Best Regards,
Shilpi

Hi again, Lorenzo,

I tried with what you suggested. I see that there the negative fraction is gone now.
I now see a plot like this (please find attached)
This is a plot of shower shape variable of two templates (signal->Green and bkg->Blue) stacked over each other.
Fit is shown in red. Data in black points.
In some bins, I see that the (signal+bkg) is not equal to fit. Do you think this is fine? I understand that
this is minimization over sum of all the bins. So for some bins, the fit may exceed the sum of (sig+bkg)
and for some bins it may be less.

Do you think I am missing something?

Thanks,
Shilpi


Dear Experts,

If I rebin the histogram, then I get 0.94687 as the fraction of background. If I use Smooth() of TH1F and do not rebin, then I get 0.998411 as the background function. Which fraction should I reply upon?

Thanks for your help,
Shilpi

Hi,
The fit minimized the overall difference data-MC, so it can be that in some bins the predictions are higher than the data.
Concerning your second question, I think you should not smooth the histogram before. The fit assumes Poisson distributions for each bins, if you are smoothing the histograms you are introducing correlations between the bins and then you cannot really use anymore the TFractionFitter

Best Regards
Lorenzo

Hi Lorenzo,

Thanks for explaining that to me. So now I should try to rebin by increasing the
bin width. I will do that now instead of smoothing.

Thanks for your help,

Best Regards,
Shilpi