I am doing a study that fits two MC histograms (signal and background) to a toy histogram using TFractionFitter. By repeating the fit on different toys generated from the same original distribution, I can obtain the pull distribution.
The way I compute the pull is described in the following steps:
Using TFractionFitter to fit the signal and background to the toy.
Obtain the fitted parameters and their errors.
As the fitted parameters are fractions, I converted them to scales by doing:
where component can either be signal or background
The errors of the scales are computed using propagation of uncertainties.
I finally compute the pull by using:
pull = (scale_component - 1) / error_component
The pulls are expected to obey the Normal distribution. In my result, the pulls do form a Gaussian of mean 0, but its std (~0.6) is smaller than the expected value (1.0)
As I’m not good at statistic, I can only tell that I might have overestimated the error, but I don’t know where things went wrong. Please tell me how to correct it.
I just want to update that the component templates I used is weighted, meaning when I did TH1::Fill, the weights are different from 1. Could it be the reason that the fit gave me the unexpected error?
Hi,
It is possible that by doing a weighted fits the error are not fully correct, due to the weights.
In addition is known that the method of TFractionFitter under-estimate the errors, because they do not take into account the fluctuations of the normalisation. Although in your case it seems to me that you are estimating a too large error. When you are generating the toys, are you considering fixed or varying the total number of events ?
Also, if you are fitting for the number of signal and background events, it is maybe better to perform a direct fit of the two components using the TF1Norm class (see ROOT: tutorials/fit/fitNormSum.C File Reference ) or using RooFit.
@Axel Out of curiosity, I checked the TFractionFitter description, and there is no single place that warns the user.
This is not the first time when some “insider knowledge” about ROOT giving incorrect results is “hidden” from a “wide public”. Could you, please, make sure that all “known problems” are explicitly mentioned in relevant places in the documentation.
Well, I think here we simply need to ask @moneta to share his “inside knowledge” in the documentation of TFractionFitter, possibly pointing out superior alternatives. @moneta could you update the doc, please?
When you are generating the toys, are you considering fixed or varying the total number of events
The number of events is fixed
Also, if you are fitting for the number of signal and background events, it is maybe better to perform a direct fit of the two components using the TF1Norm class (see ROOT: tutorials/fit/fitNormSum.C File Reference ) or using RooFit.
I will try the TF1Norm, as RooFit seems to give me worse fits compared (by eye) to TFractionFitter.
In addition, the result I in my first post came from the official data and MC samples (thus they have weights). I then tried the template fit with non-weighted toysignal and background templates, the pulls from the fist assembled the normal distribution quite well (with std ~0.9)
Thank you again, I will try your suggestion and update the result,
Sorry for the late reply. Unfortunately I cannot share my inputs as they stay on the local machine, and the files are quite large to be uploaded to lxplus (my codes are also messy).
It seems that I have over-estimated error because my toy template is a sum of 2 histogram (I generated the signal part and the background part of the toy separately then add them together).
using a pseudo code, the one that doesn’t work is (in either TFractionFitter or RooFit) :
I don’t think one needs to use SumW2(true), because the histograms are not weighted, and adding histograms is fine. @LongHoa, if I have understood you well, you are saying that you get different pulls from the TFractionFitter, if you are using the first or the second case to generate the input data histogram for each pseudo-experiment.
I think the first case is not correct, you are fixing the number of background and signal events, instead it should fluctuate according to a binomial distribution.
The second case is instead correct and should be used.
Yes, that’s what I mean. The second case gives me the expected sigma of the pull distribution.
Please correct me if I’m wrong: the sum of 2 randomize histograms (1st case) doesn’t have the correct uncertainty, which results in miscalculating the error, and thus should not be used.
If this is the case, will:
The “Sumw2” call must appear before “Add” (as pointed out by @moneta, it is only required if any histogram is filled with “weights” not equal to 1).
The sum of 2 histograms have the correct uncertainty computed and if you call Sumw2 if your histogram is weighted. If it is not you don’t need to.
The problem in your cases is not the uncertainty of the histogram, is in the procedure.
If you randomise h1 and h2 with n1 and n2 as number of events, you have a different result than randomizing the sum of h1+h2 with (n1+n2). The result will be the same only if n1=n2 and the histograms have the same integrals.
Thank you, I understand that by generating the signal part and the background part of the toy, it will fix the number of events in each part, but I still don’t understand how it would affect the fit uncertainty.
By the way, as my problem is solved now, I mark the topic as “Solved”. Thank you all again for the help.
Generating the signal and background part separately will not affect the fit uncertainty, but will affect the fluctuations in your obtained fit results values (there will be smaller) and this will affect your pull results.