Error calculations while fitting normalised data distributions

Dear Experts,

We are working on a binned composite (two components signal and background) fit of data distribution. Data distribution is normalized to the unit integral and also has associated experimental uncertainties (combined but not the individual sources). After fitting, the
central value of the signal fraction is on the expected line but I have doubts about the associated uncertainties. Can you please suggest the correct approach to get uncertainties on the extract signal fraction and how can propagate the effect of the experimental uncertainties on data to the signal fraction?

Thanks!

Regards
Sunil

roofit_sigfrac.py (3.7 KB)
mixSignal_templates.root (9.1 KB)
dataHist.root (9.3 KB)
histfile_ZJ_AMC_FxFx_py_pyroot.root (21.9 KB)

Hi!

The way you are getting the uncertainties is correct. I just noticed that you have the SumW2Error() option switched on, I hope this is intentional. If you switch it on, the uncertainties will be transformed to be what you get if your dataset was unweighted (see the documentation of RooAbsPdf). Still, it confused me a bit that your dataset was weighted, because usually data is never weighted if it’s from an experiment. Only Monte Carlo simulations are weighted, usually.

Can you please suggest the correct approach to get uncertainties on the extract signal fraction

The signal fraction depends linearly on the background fraction, so you already have it: it’s the same as the background fraction uncertainty that you have in your fit result.

how can propagate the effect of the experimental uncertainties on data to the signal fraction?

If your have experimental uncertainties, what is generally done in HEP is to extend your model with additional free parameters (“nuisance parameters”) that reflect your experimental uncertainties (see also “Practical statistics for the LHC”). Then you do the fitting as usual, and your final parameter of interest (background fraction) will have a larger uncertainty because of the additional correlated nuisance parameters.

Would that strategy work for your usecase? If you know what you want to achieve mathematically and then have questions about implementing in RooFit, please feel free to follow up!

Cheers,
Jonas

Dear Jonas,

Thanks for your prompt response! You response clarified most of the doubts.
We are dealing with unfolded experimental data which is further normalized to get differential cross-section and shape-only distribution that leads to the situation of the weighted distribution.

Is there any example that can be referred to understand the implementation of nuisance parameters in RooFit?

Thanks!

Regards
Sunil

Hi!

One tutorial that I really like to understand how constraints are implemented is rf604_constraints.py (there is also a C++ version if you prefer).

Specifically for parametrizing statistical uncertainties you can use the RooParamHistFunc.

An example of the RooParamHistFunc can be found in rf709_BarlowBeeston.py.

If you are looking for a more higher level interface to build statistical models for binned fits based on Histograms, you can take a look at HistFactory. It is widely adopted by the ATLAS collaboration for example, and if offers convenient interfaces to implement both normalization uncertainties and shape uncertainties for your histograms (as explained in this note on CDS).

Hope this helps to get started with nuisance parameters!

Cheers,
Jonas

Dear Jonas,

Thanks for these pointers! These will be really useful.

Regards
Sunil