Fit to weighted sample and SumW2Error

FLuan · June 8, 2021, 8:26am

Dear experts,

I’m trying to fit a weighted MC sample with RooFit, Root version 6.24. If no option is specified for the error calculation in the fitTo(), I receive the following warning:

[#0] WARNING:InputArguments -- RooAbsPdf::fitTo(pdf_fit) WARNING: a likelihood fit is requested of what appears to be weighted data.
       While the estimated values of the parameters will always be calculated taking the weights into account,
       there are multiple ways to estimate the errors of the parameters. You are advised to make an
       explicit choice for the error calculation:
           - Either provide SumW2Error(true), to calculate a sum-of-weights-corrected HESSE error matrix
             (error will be proportional to the number of events in MC).
           - Or provide SumW2Error(false), to return errors from original HESSE error matrix
             (which will be proportional to the sum of the weights, i.e., a dataset with <sum of weights> events).
           - Or provide AsymptoticError(true), to use the asymptotically correct expression
             (for details see https://arxiv.org/abs/1911.01303).

and the fit converges [covQual() = 3]. The same happens if I use SumW2Error(false). However, if I use the option SumW2Error(true), the fit doesn’t converge [covQual() = 2]. Also, in any of these configurations, the yield’s uncertainty returned is too small [much smaller than sqrt(N)]; for example, with SumW2Error(false) I get

Nsig = 731393 +- 34.7383, covQual() = 3

and with SumW2Error(true) I get

Nsig = 731393 +- 38.0404, covQual() = 2.

I also tested with another minimizer (Minuit2), but the results are very similar.
My questions are:

Why is the fit failing to converge with SumW2Error(true)?
And why is the fitted yield’s uncertainty so small in both cases?

A short running example with the whole code and a rootfile is attached in case you want to reproduce this problem.

Thanks in advance!
Cheers,

FitG2CB_simple_MC.C (11.4 KB)
histofile_MC.root (5.5 KB)

jonas · June 8, 2021, 11:55am

Hi @FLuan,

thanks for providing the code to reproduce your problem. I’ll address your two questions.

With the SumW2Error(true) option, the fit is done twice. First with the original weights, then with the squared weights. You just have bad luck that the additional fit with the squared weights is more problematic in your case. You can see this in your covariance matrices: for the SumW2Error(true) case, you have much higher correlations, which is usually at the origin of unstable fits (see this presentation for more info on common fitting problems).

If you want the perfect covariance matrix quality, you should change your model a bit such that it has less highly correlated parameters (maybe the new RooCrystalBall class can be useful for that).
The yield uncertainty is so small because you were setting the range of your N_Sig parameter wrong. You used the nominal value from your histogram integral as the upper boundary. Hence, the fit result will be on that upper boundary, which confuses the error estimators. Extending the N_Sig parameter range should help here:
```
RooRealVar *N_Sig = new RooRealVar ("N_{sig}",
                                    "Number of signal events",
                                    1.0*xent,0.,2.*xent);
```

I hope this advice helps to solve your problem. If not, please feel free to ask any further questions!

Cheers,
Jonas

system · June 22, 2021, 11:56am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.