Sampling parameters from RooFit result with asymmetric errors

Dear RooFit experts,

From the documentation of RooFitResult, the randomizePars method will sample parameters from the covariance matrix assuming Gaussian priors for all the parameters. However, in cases where the parameters have very asymmetric uncertainties, I suppose this method is not valid. What would be the recommended way to sample the parameters instead?

Regards,
Alkaid

Dear Alkaid,

Let me add in the loop @jonas , who could help out with this question.

Best,
D

Hi @AlkaidCheng,

I would suggest following the usual prescription for interpolating asymmetric pre-fit uncertainties, as done in HistFactory (see also page 16 of the HistFactory paper). In particular, to use the polynomial interpolation as implemented by “interpolation code 4” in the PiecewiesInterpolation class, which is used by HistFactory.

In other words, you sample from a unit Gaussian and then transform the value such that the mean and ± 1sigma of the Gaussian correspond to your best-fit parameter values and up and down variations, while making sure the interpolation is smooth (no jumps in first and second derivatives). Like this, the priors for your parameters will also have no discontinuity. Here a little demo for that, assuming you have a parameter with post-fit value 3.5 and asymmetric uncertainties (-0.4,+0.7):

void interp_demo()
{
   RooRealVar param{"param", "param", 3.5, 0, 10};
   RooRealVar alphaParam{"alpha_param", "", 0.0, -5.0, 5.0};

   RooGaussian alphaParamConstr{"alpha_param_constr", "", alphaParam, 0.0, 1.0};

   param.setAsymError(-0.4, 0.7);

   PiecewiseInterpolation interp{"interp",
                                 "",
                                 RooFit::RooConst(param.getVal()),
                                 RooArgList{param.getVal() + param.getErrorLo()},
                                 RooArgList{param.getVal() + param.getErrorHi()},
                                 RooArgList{alphaParam}};

   interp.Print("t");

   interp.setAllInterpCodes(4);

   auto hist = new TH1D{"hist", "sampled values", 1000, 0., 10.};

   TRandom3 rng;

   for (int i = 0; i < 10000000; ++i) {
      alphaParam.setVal(rng.Gaus());
      hist->Fill(interp.getVal());
   }

   auto c1 = new TCanvas{};
   hist->Draw();
   c1->Draw();


   c1->SaveAs("plot.png");
}

The little kink at the +1 sigma comes from the transition between polynomial interpolation and linear extrapolation.

Now, since you want to randomize with respect to the post-fit results I think you also want to consider the covariances. So I would fill the Gaussian samples in a vector and multiply them with the correlation matrix of your fit result, similar to how the implementation of RooFitResult::randomizePars() is multiplying with the covariance matrix. Then, you do the transformations as I described and that’s it. If you think this is a good prescription, we can even think of having this as an option to randomizePars().

If you don’t want to approximate with interpolations, and if computational cost permits, you can also use sample parameter points according to the full likelihood function (if available), for example with the Metropolis-Hastings algorithm. This is implemented in RooStats::MetropolisHastings, maybe it’s worth to try out this class.

Personally, since nuisance parameters are usually constrained by interpolated up-and down variations anyway, I would not mind doing the interpolation also for post-fit. But maybe for parameters-of-interest I would compare the interpolation also to more precise methods like Metropolis-Hastings before trusting it.

I hope these ideas help! Let us know how it goes and if you have more questions.

Cheers,
Jonas

HI @jonas ,

Thanks for the detailed reply. I gave the PiecewiseInterpolation method a try and extended your example to the correlated variable case according to your suggestion. Here’s some questions that I hope you could clarify:

  1. Since you have multiplied alpha vector by the correlation matrix (g @ Lt according to the notation of RooFitResult::randomizePars(), I suppose now that the result is no longer the number of standard deviation away from the mean of Gaussian. Therefore, the default value of boundary = 1.0 in PiecewiseInterpolation evaluation is no longer applicatble. I suppose it should be replaced by sqrt(covariance_matrix[i][i]).
  2. If I look more carefully into the implementation of interpolation code 4 or 6, I believe this line is wrong, as x should be t instead, i.e. mod = t * (S + t * A * (15 + t * t * (-10 + t * t * 3)))
  3. After these fixes, I tried to do the sampling of parameters, the results are making sense but not very good especially when the errors are very asymmetric. Below is a demo fit result that I have:
import ROOT

observable = ROOT.RooRealVar('myy', 'myy', 125, 105, 160)
observable.setBins(55)
c = ROOT.RooRealVar('c', 'c', 1, -10, 10)
base_pdf = ROOT.RooExponential('exp', 'exp', observable, c)
norm = ROOT.RooRealVar('N', 'N', 1, -float('inf'), float('inf'))
pdf = ROOT.RooAddPdf('exp_extended', '', ROOT.RooArgList(base_pdf), ROOT.RooArgList(norm))

observable.setRange('SB_Lo', 105, 120)
observable.setRange('SB_Hi', 130, 160)

bin_edges = np.linspace(105, 160, 55 + 1)
x = (bin_edges[:-1] + bin_edges[1:]) / 2
y = np.array([1201.58, 1179.01, 1140.39, 1128.87, 1111.56, 1124.37, 1098.3 ,
       1052.13, 1015.04, 1006.18,  971.46,  945.95,  931.43,  913.02,
        871.57,    0.  ,    0.  ,    0.  ,    0.  ,    0.  ,    0.  ,
          0.  ,    0.  ,    0.  ,    0.  ,  724.1 ,  699.51,  687.49,
        667.44,  657.39,  649.17,  634.78,  600.47,  586.51,  580.54,
        570.01,  588.22,  555.9 ,  540.72,  531.02,  546.06,  528.99,
        521.36,  495.24,  483.4 ,  448.63,  480.71,  454.2 ,  450.19,
        436.4 ,  413.69,  412.14,  420.22,  417.97,  404.38])
arrays = {
    'myy': x,
    'weight': y
}
dataset = ROOT.RooDataSet.from_numpy(arrays, ROOT.RooArgSet(observable), weight_name='weight')

fit_result = pdf.fitTo(dataset, ROOT.RooFit.Range('SB_Lo,SB_Hi'), ROOT.RooFit.PrintLevel(-1),
                       ROOT.RooFit.Hesse(True), ROOT.RooFit.Save(), ROOT.RooFit.Strategy(2),
                       ROOT.RooFit.AsymptoticError(True))

This gives (I refitted once)

{'c': {'value': -0.020686713133077784,
  'errors': (-0.009063431420349587, 0.0003276873353671196)},
 'N': {'value': 39929.87921962675,
  'errors': (-6416.289580898308, 223.7984340402964)}}

The distributions that I got after sampling 100,000 parameters are:

There’s also some extremely small (and negative) or extremely large values for N which is not quite satisfactory.

Do you have idea to improve on this? Or should we use MetropolisHastings?

Though I do think that it would be nice to have this implementation of randomizePars using piecewise interpolation.

Cheers,
Alkaid