I am doing a study that fits two MC histograms (signal and background) to a toy histogram using TFractionFitter. By repeating the fit on different toys generated from the same original distribution, I can obtain the pull distribution.
The way I compute the pull is described in the following steps:
Using TFractionFitter to fit the signal and background to the toy.
Obtain the fitted parameters and their errors.
As the fitted parameters are fractions, I converted them to scales by doing:
It is possible that by doing a weighted fits the error are not fully correct, due to the weights.
In addition is known that the method of TFractionFitter under-estimate the errors, because they do not take into account the fluctuations of the normalisation. Although in your case it seems to me that you are estimating a too large error. When you are generating the toys, are you considering fixed or varying the total number of events ?
@Axel Out of curiosity, I checked the TFractionFitter description, and there is no single place that warns the user.
This is not the first time when some “insider knowledge” about ROOT giving incorrect results is “hidden” from a “wide public”. Could you, please, make sure that all “known problems” are explicitly mentioned in relevant places in the documentation.
Well, I think here we simply need to ask @moneta to share his “inside knowledge” in the documentation of TFractionFitter, possibly pointing out superior alternatives. @moneta could you update the doc, please?
I will try the TF1Norm, as RooFit seems to give me worse fits compared (by eye) to TFractionFitter.
In addition, the result I in my first post came from the official data and MC samples (thus they have weights). I then tried the template fit with non-weighted toysignal and background templates, the pulls from the fist assembled the normal distribution quite well (with std ~0.9)
Thank you again, I will try your suggestion and update the result,
I don’t think one needs to use SumW2(true), because the histograms are not weighted, and adding histograms is fine. @LongHoa, if I have understood you well, you are saying that you get different pulls from the TFractionFitter, if you are using the first or the second case to generate the input data histogram for each pseudo-experiment.
I think the first case is not correct, you are fixing the number of background and signal events, instead it should fluctuate according to a binomial distribution.
The second case is instead correct and should be used.
Yes, that’s what I mean. The second case gives me the expected sigma of the pull distribution.
Please correct me if I’m wrong: the sum of 2 randomize histograms (1st case) doesn’t have the correct uncertainty, which results in miscalculating the error, and thus should not be used.
If this is the case, will:
The sum of 2 histograms have the correct uncertainty computed and if you call Sumw2 if your histogram is weighted. If it is not you don’t need to.
The problem in your cases is not the uncertainty of the histogram, is in the procedure.
If you randomise h1 and h2 with n1 and n2 as number of events, you have a different result than randomizing the sum of h1+h2 with (n1+n2). The result will be the same only if n1=n2 and the histograms have the same integrals.
Thank you, I understand that by generating the signal part and the background part of the toy, it will fix the number of events in each part, but I still don’t understand how it would affect the fit uncertainty.
By the way, as my problem is solved now, I mark the topic as “Solved”. Thank you all again for the help.
Generating the signal and background part separately will not affect the fit uncertainty, but will affect the fluctuations in your obtained fit results values (there will be smaller) and this will affect your pull results.