TFractionFitter Template Distortion

psaouter · January 4, 2014, 3:08pm

Dear Experts,

I am working with root 5.34 on MacOSX Mavericks.

I am fitting a given data distribution using TFractionFitter to accomate the use of template histograms for the fit. In the example discussed here, I am using three templates to fit the data distribution in a given range. The signal contributions are quite obvious which should make the fit simple enough. I am not interested for the moment in any high quality of the fit as I know my templates are biased. I have posted all the related material in attachment.

When I plot the result of the fit, the sum of the 3 histogram pdfs shows a reasonnable fit (red curve). However, when I plot the individual template contributions (appropriately scaled by the results of the fit, at least I think…), I see a very unpleasant distortion of the first template (in black). It seems like some artificial content is added to the template to accomodate for a better fit. The issue doesn’t seem to come from the scaling and I’ve also observed the absence of this distortion for a slightly different data distribution. The level of distortion is certainly affected by the fitting range used for the integration but I don’t understand why there should be any distortion of the template in the first place. The fit should only act on some scaling of the input templates or not? The problem does not appear when using Roofit but the coding with TFractionFitter is somewhat easier for this simple exercise.

I have been looking for the bug in my code for a while now but failed to find an explanation. Surely the problem is on my side. If anyone had a hint for me, it would be very much appreciated! By the way, is the TFractionFitter class the appropriate tool for what I am doing?

Best regards,

Pierre
TemplatesAndFitResults.pdf (74.1 KB)
ProblemTFracFit.root (8.68 KB)
ProblemTFractionFitter.cxx (3.8 KB)

moneta · January 6, 2014, 12:13pm

Hi,

I think what you observe is expected. TFractionFitter performs a fit bin by bin, assuming a Poison statistics in each MC template. In those regions where you see the distortion, you have an excess of data points and some of the MC template have zero content. The fit tries then to accommodate it by adding contents to some of the template, in particular the one with the largest fraction. See the discussion in section 5 of the Barlow’s paper.
If you want to avoid these distortions, you might want to fit not considering the statistical errors on the MC templates.
In this case it is easy to write the likelihood, or otherwise you can use RooFit. In RooFit the statistical error on the templates is not considered, if you write your model pdf as sum of RooHistPdf’s for each MC template.

Best Regards

Lorenzo

psaouter · January 14, 2014, 3:54pm

Dear Lorenzo,

Sorry for the late reply and thank you very much for your answer. Although I understand your explanation, I do find it quite strange that low statistics entries can be affected in such a significant way to accommodate for a better fit. For example, colleagues reported that using as input template a constant histogram, TFractionFitter also produces a good fit by modifying the bin contents to accommodate for a better fit. With such a behavior, what is the purpose of even giving input templates to the fitting procedure? Surely I am missing some point here.

Never mind, I have been switching back to RooFit since I need to consider potential offsets in the template definition. I don’t think this can be done with TFractionFitter (?).

Best Regards,

Pierre