Yield's uncertainties for fitted pdf components in sub-ranges

FLuan · May 13, 2021, 10:40pm

Dear experts,

I perform a fit to an invariant mass distribution with a model composed by a signal and a background pdf with RooFit using ROOT 6.22/06. Now I’m trying to get the uncertainties for the signal and the background components in different sub-ranges of the fitted spectrum.

I’ve tried to do this by two different methods:

1. with RooFit createIntegral (as in this post)

In this case, I compute the integral for a given pdf component in a given range as

RooAbsReal* i_sig = epdf_Sig->createIntegral(*mass, NormSet(*mass), RooFit::Range("signalRange"));

and compute the yield and uncertainty as

Double_t NsigSigWin= Nsig*i_sig->getVal();
Double_t dNsigSigWin= Nsig*i_sig->getPropagatedError(*fit,*mass);

for the signal region, I get

Method 1: Fractions in sig region => sig= 0.980527 +- 0.000345
Method 1: Fractions in sig region => bkg= 0.330025 +- 0.001270
Method 1: Fractions in sig region => tot= 0.495956 +- 0.001029

Method 1: Nsig in signal region = 79827.512086 +- 28.076938
Method 1: Nbkg in signal region = 78467.443825 +- 301.952905
Method 1: Ntot in signal region = 158299.297326 +- 328.509493

Here the uncertainty for the signal yield (Nsig) don’t seem to make much sense, while for the background (Nbkg) it seems reasonable.

2. converting to a TF1 and using IntegralError (as in here and here)

For this I get the parameters list related to the pdf component, make a TF1 from this component and compute the IntegralError using the reduced covariance matrix with the varying parameters for that component, as in

    RooArgList pars_sig(*N_Sig, *m, *Sigma_G1);
    RooArgSet prodSet_sig(*epdf_Sig);
    RooProduct unNormPdf_sig("fitted Function Sig", "fitted Function", prodSet_sig);
    TF1 * f2_sig = unNormPdf_sig.asTF(RooArgList(*mass), pars_sig);
    double nsig = ((RooRealVar*) pars_sig.find("N_{sig}"))->getVal();
    Double_t fSig_full = f2_sig->Integral(massLo, massHi);
    Double_t dnsig_full = nsig*f2_sig->IntegralError(massLo, massHi, 0, fit->reducedCovarianceMatrix(pars_sig).GetMatrixArray())/fSig_full;
    Double_t fSig_sigreg = f2_sig->Integral(massSigLo, massSigHi, 0)/fSig_full;
    Double_t nsig_sigreg = nsig*f2_sig->Integral(massSigLo, massSigHi, 0)/fSig_full;
    Double_t dnsig_sigreg = nsig*f2_sig->IntegralError(massSigLo, massSigHi, 0, fit->reducedCovarianceMatrix(pars_sig).GetMatrixArray())/fSig_full;

In this case, I get

Method 2: Fractions in sig region => sig= 0.973391
Method 2: Fractions in sig region => bkg= 0.330023

Method 2: Nsig in signal region = 79250.588504 +- 497.202333
Method 2: Nbkg in signal region = 78467.443825 +- 26045.335944

Now the uncertainty for the signal yield make much more sense, but the one for the background doesn’t.

As a “cross-check”, I’ve tried to compute the yields w/ uncertainties in the whole mass spectrum to compare with the fitted yields.

For method 1, I get

Method 1: Fractions in whole spectrum => sig= 1.000000 +- 0.000000
Method 1: Fractions in whole spectrum => bkg= 1.000000 +- 0.000000
Method 1: Fractions in whole spectrum => tot= 1.000000 +- 0.000000

Method 1: Nsig in whole spectrum = 81416.989289 +- 0.000000
Method 1: Nbkg in whole spectrum = 237763.211200 +- 0.000000
Method 1: Ntot in whole spectrum = 319180.200489 +- 0.000000

And for the method 2:

Method 2: Nsig in whole spectrum = 81416.989289 +- 548.049714
Method 2: Nbkg in whole spectrum = 237763.211200 +- 78893.342729

while the fitted yields are

Fitted yield: Nsig = 81416.989289 +- 547.600470 (getPropagatedError); NsigError = 547.599057 (getError)
Fitted yield: Nbkg = 237763.211200 +- 675.409048 (getPropagatedError); NbkgError = 675.406395 (getError)

The method 2 uncertainty for the signal yield is very close to the fitted one, but the background yield uncertainty is much larger.

So, why do both methods give different results?
Why the method 2 seems to almost reproduce the uncertainty from the fitted signal yield for the whole spectrum, but not for the background? And why it is still slightly different?
And, more generally, how to correctly compute the uncertainties for pdf components in a given sub-range?

A short running example with the whole code and a rootfile is attached in case you want to reproduce this problem.

Thanks in advance!
Cheers,

FitG2CB_simple.C (22.6 KB)
histofile.root (3.9 KB)

couet · May 17, 2021, 6:16am

I think @moneta can help you.

Your macro gives me this plot:

jonas · May 17, 2021, 2:17pm

Hi @FLuan!

I don’t know why you get these huge background uncertainties for Method 2, but I can tell you what was missing in your Mathod 1 with RooFit only.

You propagated the uncertainty from the pdf parameters, but missed to propagate the uncertainty from the normalizations, and add them in quadrature. That’s why you had too small uncertainties for the signal: the uncertainty is dominated by the normalization here, while the uncertainty on the parameters is really low. In your cross check over the full range, the uncertainty was zero because the full integral of a pdf doesn’t depend on the shape by definition (it’s always 1 +/- 0).

I attach a version of your script where I implemented this:
FitG2CB_simple_jonasedit.C (23.1 KB)

Now the results of Method 1 make all sense to me. You still need me to investigate what’s wrong with Method 2, or is this not relevant now that Method 1 appears to work?

Let me know if you have any more questions!

Cheers,
Jonas

FLuan · May 17, 2021, 5:49pm

Hi @jonas

I see! It would be interesting to know why the Method 2 doesn’t work properly or how to make it work (maybe it could be related to some correlation between pdf components that is being ignored), but at least for now I’m satisfied enough with Method 1 as it seems to work fine with your solution.

Thank you again!
Cheers,
Felipe

system · May 31, 2021, 5:49pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

jonas · September 26, 2022, 1:59pm

Little follow up, I realized now that what I said here is not completely correct:

You propagated the uncertainty from the pdf parameters, but missed to propagate the uncertainty from the normalizations, and add them in quadrature.

I had to think about the problem again because of this new forum post, and I actually you can’t just add the uncertainties in quadrature in general, as there might be fit parameters that are correlated with the yield parameters. The correct solution that would also take this correlation into account would be creating a new RooProduct that multiplies the sub-range integral with the yield, and then use RooAbsReal::getPropagatedError() to correctly propagate the uncertainties. For an example, take a look at the new forum post that is linked at the beginning of the paragraph.