Result of fitting RooHistPdf depends on stat. precision?

Hi,

I have a question concerning the treatment of model uncertainties in RooFit:
in my application, I fit template histograms (represented by RooHistPdf) for signal and background processes to a distribution observed in data (represented by RooDataHist), in order to determine normalization factors for the signal and the background.

The model that I use in the fit is a RooAddPdf (which has two components: the RooHistPdf for the signal and the RooHistPdf for the background) and the parameter that I fit is the relative contribution of the two RooHistPdfs to the distribution observed in data.

I am using the RooFit version included in ROOT 5.18/00a .

I observe that the uncertainties on the signal and background model (represented by bin errors of the RooHistPdfs) are neglected by the fit (as a test, I set all bin errors to zero and observed that the fit returned identical results).

At first, I thought, the negligence of the uncertainties would not affect the central values of the fit, only their estimated uncertainties.
I then wrote a small test program to test this and it seems that this is not the case - the results of the fit do seem to depend on the statistical precision of the signal and background Pdfs.

In the attached test program, I get very different results for the relative fraction of signal and background. The fit results seem to overestimate (underestimate) the Pdf which has larger statistical uncertainties.
(models which fluctuate significantly are “penalized” by the fit)

In my application, I do not have control over the statistical precision of the signal and background templates (because they are both determined from data and have very different cross-sections).

Can you please advise me if it is possible to avoid the dependency of the fit results on the statistical precision of the signal and background templates and what I shall do ?

Thank you very much,

Christian

(the test program and plot of the fit results for 1000 “toy experiments” are available also at desy.de/~veelken/bgEstTemplateShapeTest.ps in case there is a problem with the attachment)
bgEstTemplateShapeTest.ps (12.5 KB)
bgEstTemplateShapeTest.C (5.42 KB)

Hi Cristian,

The behavior you see is I think somewhat expected: the RooHistPdf class does not propagate the effect of the statistical uncertainty of the underlying histogram to the likelihood, it merely represents the distribution as a fixed shape. As shapes based on low statistics samples are more prone to be ‘wild’ they are less likely to fit well to a data sample and are thus effectively penalized.

The solution for what I think you want to do is correctly represented in class TFractionFitter, which does take these uncertainties into account.
I would like to propagate this construction into RooFit at some point, but
I have no have the time yet to work out the math and the code…

Wouter

Dear Wouter,

thank you very much for the clarification.

I have started “smoothing” the template histograms by fitting them with a TF1 function object and using RooTFnPdfBinding instead of RooHistPdf as input for RooFit.

This seems to work quite well :slight_smile:

I am thinking that replacing the template histograms by analytic functions may also have the advantage that I could allow the template shapes to vary (to some extend) during the fit, in order to better fit the templates to the distribution observed in data.

What I have in mind is to include Gaussian constraints for the parameters
of the TF1 objects into the PDF used for the fit and then let the fit vary the parameters of the TF1 functions.
Would that be possible ?
(When looking at RooTFnPdfBinding, I don’t see a way to “interface” the parameters of the TF1 function object to RooRealVars that I could include into the fit, but maybe there is either a “trick” that I don’t know of yet or RooTFnPdfBinding is not the right class for me to use ?)

Thank you very much again,
cheers,

Christian

Hi Cristian,

If you want to smooth your function, you can also use the interpolation
option of RooHistPdf, simply set the interpolation order as 5th argument
of the constructor. If you source data is unbinned (e.g. a TTree), class
RooKeysPdf offers even better modeling using kernel estimation.

Concerning RooTFnBinding parameters: this was an oversight on my part,
I’ll try to fix it for the next release.

Wouter

Hi Wouter,

just that I’m interpreting my results correctly and since this thread is now already two years old:

Are the errors of the templates still not taken into account?

Thanks and cheers,
Dennis

Hi,

just the same request as dennis.
has this been changed?

delo

Hi all,

I have been doing some work on this in the past months. I hope to release code for the next ROOT release that starts to support template uncertainties.

Please note that the subject is not completely trivial so features will be added in batches, but a first set of workable options will be available for 5.33 (due in a couple of weeks from now)

Wouter

Dear Wouter,

Could you please confirm that this nice feature is now on the 5.33 version (that I can see on SVN) ? If yes, would you have a code snippet example showing how to make the fit deal with histograms uncertainties ?

Thank you in advance, Olivier.

Hi all,

I confirm that the feature is now in ROOT 5.33.

I attached a simple example that demonstrates
a template fit with rigid templates and with parameterized
templates.

At present only parameterized histogram functions
are implemented (but the sum of these is transformed
into a pdf with RooRealSumPdf). A p.d.f. version will
follow soon.

Wouter
demo.C (2.75 KB)

Hi,

Just to confirm, is this fix also included in the RooFit 3.52 bundled with the latest patch release (5.23.03)?

Thanks,
–Advait

Hi,

I don’t think so. The fix is in the latest ROOT release, 5.34. The version of Roofit in 5.34 I think is 3.54. 3.52 is the version released with ROOT 5.32

Lorenzo

Dear all,

since i think my question is along the line of this thread, i preferred to not start a new one.

i would like to use RooFit for the following case: fit data with a two-components template fit. One component is data-driven (for QCD background) and the other is taken from MC as the sum of EWK+TTbar components. I would like the statistical uncertainty from each MC sample to be taken into account properly in the fit.
If i use one RooHistPdf for the sum of MC components, normalizing the TH1 to same luminosity to be properly summed, will be ok for my purposes? I mean, would RooHistPdf propagate properly the uncertainties from each MC component to the sum so that they will impact in the fit properly? Better to sum TH1 or RooHistPdf ?

Many thanks in advance for your help.

Best,
Max

Dear All,

I have implemented a template fit with histograms following the first example given by Wouter (the rigid templates).

I use TH1 histograms to create the RooDataHist object.
The Problem seems to me that the errors of the template histograms are still not taken into account as the fit results do not change no matter how high or low the errors are set.
I use Root 5.34.
Is this behaviour to be expected ?

Regards

Till

Hi Wouter,

Is there any documentation on what your demo is actually doing? Specifically, the documentation pages for RooParamHistFunc and RooHistConstraint are not very helpful.

Thanks,
Nic

Hi Wouter,

How is the status on this topic?
Has a p.d.f. version already been implemented to the latest RooFit?

Thanks,
Shingo

Hi!

I would also be interested in knowing the status :slight_smile:

Thanks!

Brais

Hello,

I’d second the question: Does it work with ordinary RooHistPdf’s? :slight_smile:

Thank you!
Jan

Hello,

We have a similar question regarding the error in the RooHistPdfs. We are making an angular correlation measurement using templates from MCs with full or zero correlation. Scale factors are applied in the usual way to these distributions, which are then used to generate RooDataHists and from those RooHistPdfs. These templates are then used to fit to data with the following model:

model = (num. Bkg) * (Bkg pdf) + (num. Sig) * [f * (Corr. pdf) + (1 - f) * (Non-Corr. pdf)]

where num. bkg is determined from MC and num sig is calculated by the fit.

Based on this chain, we’ve been concerned that the error isn’t being handled in the correct manner in our procedure – while the bin-by-bin errors are included in the RooDataHists, will they be handled correctly in the application of the pdf templates used to fit? Or are these pdfs of fixed shape?

Thanks,
Aiken

----------- Fitting code

RooRealVar x(“x”,“x”,-1,1);
RooRealVar fcorr(“fcorr”, “f_{corr}”, 0.001, 0., 2.);
RooRealVar nbkg(“nbkg”,“number of background events”,bgEstimate,bgEstimate,bgEstimate);
RooRealVar nsig(“nsig”,“number of signal events”,8000,0,150000);

Corr_TemplateHist->Scale(Data_Hist->Integral() / Corr_TemplateHist->Integral());
NoCorr_TemplateHist->Scale(Corr_TemplateHist->Integral() / NoCorr_TemplateHist->Integral());

RooDataHist fullCorrTemp(“fullCorrTemp”,“fullCorrTemp”,x,Corr_TemplateHist);
RooHistPdf fullCorrPdf(“fullCorrPdf”,“fullCorrPdf”,x,fullCorrTemp,0);

RooDataHist noCorrTemp(“noCorrTemp”,“noCorrTemp”,x,NoCorr_TemplateHist);
RooHistPdf noCorrPdf(“noCorrPdf”,“noCorrPdf”,x,noCorrTemp,0);

RooDataHist bgTemp(“bgTemp”, “bgTemp”, x, BG_TemplateHist);
RooHistPdf bgPdf(“bgPdf”, “bgPdf”, x, bgTemp, 0);

RooDataHist hData(“hData”,“hData”,x,Data_Hist);

RooAddPdf sigPdf(“sigPdf”, “sigPdf”, RooArgList(fullCorrPdf, noCorrPdf), fcorr);
RooAddPdf model(“model”,“model”,RooArgList(bgPdf,sigPdf),RooArgList(nbkg,nsig));

RooFitResult* rFit = model.fitTo(hData,RooFit::Extended(kTRUE), RooFit::Save());
rFit->Print(“v”);

Dear Wouter and all,

I have implemented a template fit with the Beeston Barlow method using your demo.C as inspiration.

I am able to fit my data distribution with a signal template (derived from MC) and a background one (taken from data), but I am struggling to extract valuable information out of it.
After the fit I would like to be able to extract the number of signal and background events in a sub range of the fitting interval.

I tried to first calculate the total number of signal and background events (that would be the yield value after the fit times the histFunc normalization), and then integrate the parametric hist functions in the interval of interest to calculate the fraction of events contained therein.

However since I am dealing with non normalized functions, I obtain funny values for the integrals.

Hence my questions are: is there a pdf object I can use for the implementation of the Beeston Barlow fit? if no, does my approach seem to be the correct one for obtaining my result? (if the answer to that is yes I will post some working examples to show what I mean with “funny values”)

Thank you very much,

Giacomo

Hi all,

I’m still wondering if a p.d.f. version already been implemented to the latest RooFit version.

Many thanks!
Yanhui

2 Likes