I am using TFractionfitter class of root. I have a histogram from data from which I am
trying to estimate the fraction of real photons and fraction of fake ones.
The two templates I have are a signal photon template and fake photon template.
Background histogram has ~289 entries, signal template has ~4*10^6 entries and data histogram
has ~3300 entries.
I am not able to fit it using TFractionFitter as it gives the following fit error:
MINUIT WARNING IN MIGRAD
============== MATRIX FORCED POS-DEF BY ADDING 0.005887 TO DIAGONAL.
MIGRAD FAILS TO FIND IMPROVEMENT
MACHINE ACCURACY LIMITS FURTHER IMPROVEMENT.
MIGRAD MINIMIZATION HAS CONVERGED.
MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
MIGRAD TERMINATED WITHOUT CONVERGENCE.
FCN=-5.68298e+07 FROM MIGRAD STATUS=FAILED 118 CALLS 266 TOTAL
EDM=0.00203654 STRATEGY= 1 ERR MATRIX APPROXIMATE
EXT PARAMETER APPROXIMATE STEP FIRST
NO. NAME VALUE ERROR SIZE DERIVATIVE
1 frac0 9.60852e-01 3.76508e-02 5.00000e-01 -2.30640e-01
2 frac1 3.91587e-02 2.73670e-02 5.48456e-02 1.45610e-02
ERR DEF= 0.5
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 2 ERR DEF=0.5
ERR MATRIX APPROXIMATE
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2
1 0.16645 1.000 -0.166
2 0.16645 -0.166 1.000
ERR MATRIX APPROXIMATE
Error in <TFractionFitter::ComputeChisquareLambda>: Fit not yet (successfully) performed
fit status: 4
How would you suggest to deal with such situations?
Please let me know if more on this is required from my side.
Some more information:
I tried using the smooth function of TH1. I checked that if I smooth it > 5 times, it is fitting.
If I smooth 6 or 7 times, then i get a negative fraction for real photons.
Beyond that it is always positive.
How do I check that smoothing is not biasing the fractions in any way?
Can you please posts your histograms and your code reproducing your problem so I can have a look into it
Thanks for your reply. I have put all setup in forDebug.tar
Please find it attached.
It is all setup to produce the result. If you do ./plot.exe, it will produce the result.
Main script is plot_EB.C. It takes two input root files:
(1) file_dR0.300000.root - contains data and background sigma-ieta-ieta template for various
pt bins and HLT’s
(2) nopixelphojet.root - contains signal histograms
It reads xvariables.list. It can be compiled by running ‘compile’ script.
Please let me know if you could reproduce the results or if more information is required from my side.
forDebug.tar (260 KB)
I also see that the fraction is sometimes reported negative.
Is this fine? I understood that since this is minimisation of Log likelihood,
it can give even negative results (whatever minimises the LL).
Do you think this is fine?
I think your problem is due to the low statistics in the bins for some of your template histograms which make the fit unstable, due to their large bin uncertainties. If you run your macro a second time, you see that the fit converges. The fact that sometimes you get negative fraction is another indication of the instability of the result. I would suggest you to try with more statistics or using a larger bin size,
Thanks Lorenzo for your reply. I will try the same and see if the problem goes away.
Hi again, Lorenzo,
I tried with what you suggested. I see that there the negative fraction is gone now.
I now see a plot like this (please find attached)
This is a plot of shower shape variable of two templates (signal->Green and bkg->Blue) stacked over each other.
Fit is shown in red. Data in black points.
In some bins, I see that the (signal+bkg) is not equal to fit. Do you think this is fine? I understand that
this is minimization over sum of all the bins. So for some bins, the fit may exceed the sum of (sig+bkg)
and for some bins it may be less.
Do you think I am missing something?
If I rebin the histogram, then I get 0.94687 as the fraction of background. If I use Smooth() of TH1F and do not rebin, then I get 0.998411 as the background function. Which fraction should I reply upon?
Thanks for your help,
The fit minimized the overall difference data-MC, so it can be that in some bins the predictions are higher than the data.
Concerning your second question, I think you should not smooth the histogram before. The fit assumes Poisson distributions for each bins, if you are smoothing the histograms you are introducing correlations between the bins and then you cannot really use anymore the TFractionFitter
Thanks for explaining that to me. So now I should try to rebin by increasing the
bin width. I will do that now instead of smoothing.
Thanks for your help,