Minimized NLL not corresponding to best chi2

fbartels · May 25, 2022, 2:25pm

Dear experts,

We have run into an issue of inaccuracies with the NLL minimum not corresponding to the best fit.

We are building a RooWorkspace using xmlAnaWSBuilder where we have a 5-parameter function fitting a histogram (10^9 entries, 2500 bins) that was generated by exactly this function, unfluctuated. We then perform the fit using quickFit, setting the initial parameters to the values that yield the optimal fit and if we fix them, a chi2 of (almost) 0 comes out. If we allow the fit parameters to float, the minimization of the NLL yields new fit parameters which indeed correspond to a lower NLL but offer a worse fit.

Our assumption is that we run into the limit of numerical accuracy of the NLL since the relative NLL difference between the best fit and true minimum is in the order of 10^-10. We have found that disabling the NLL offset in RooAbsPdf::createNLL() and switching the integrator via RooAbsReal::defaultIntegratorConfig()->method1D().setLabel("RooAdaptiveGaussKronrodIntegrator1D") improve the result, but still a difference remains.

Are there any other improvements that we can apply to get a better fit? I am attaching a plot of the residuals with the different settings in the fit and a tarball containing the workspace and a README with the commands used to build the workspace and fit it. We tested this both in ROOT 6.20 and 6.24.

Cheers,
Falk

residuals_J100_integrators
RooStatsNLL.tar (540 KB)

jonas · May 26, 2022, 3:05pm

Hi @fbartels!

Thanks for the detailed post, I was able to reproduce the fit.

I also ran it with the ROOT master branch so see if the problem has maybe solved itself in the meantime, and indeed the agreement is much better (fixed parameters in blue and floating parameters in red).

For reference, the script I used to produce this plot after following the instructions in the README in the archive from the initial post:

import ROOT

ROOT.gROOT.SetBatch(True)

f1 = ROOT.TFile.Open("fixPars/PostFit.root")
f2 = ROOT.TFile.Open("floatingPars/PostFit.root")

c1 = ROOT.TCanvas()

h2 = f2.Get("residuals")
h2.SetLineColor(ROOT.kRed)
h2.Draw()
f1.Get("residuals").Draw("SAME")

c1.SaveAs("residuals.png")

I think with ROOT 6.26, you would also get the same result because the differences between 6.26 and master are small.

The reason for the fit being better is that the RooFit batchmode was completely re-implemented and the new implementation is seemingly less sensitive to floating point errors.

Is there a reason why you can’t use ROOT 6.26 for this fit?

Anyway the agreement is still not perfect, probably because of some remaining floating point related errors. It’s better to investigate even further to see how we can get perfect agreement, right?

Cheers,
Jonas

fbartels · May 27, 2022, 11:02am

Hi @jonas ,

Thanks for your investigation! Did you use quickFit to perform the fit or somehting else?

I recompiled quickFit after lsetup "views LCG_102rc1_ATLAS_14 x86_64-centos7-gcc11-opt" which comes with Root 6.26/00 and RooFit 3.60. It throws a couple of warnings that RooMinuit has been replaced by RooMinimizer now, but it compiles. If I now set --useBatch 1 for quickFit, which adds BatchMode(1) to RooAbsPdf.createNLL(), the results change.

One additional parameter shows up as info from Minuit2 (the first one):

Info in <Minuit2>: MnSeedGenerator Negative G2 found - new state: 
  Minimum value : -2.173421348e+10
  Edm           : 121266.6657
  Internal parameters:
      1.542734729
     0.5235987756
     0.0316276393
     0.3459600492
     0.1197389193
    0.08797523891
  Internal gradient  :
                0
     -11152990.55
      1108.107355
      59610.54466
      -150652.755
      76293.44441

and the fit seems to abort with

MnHesse 2nd derivative zero for parameter _J100yStar06_obs_x_channel ; MnHesse fails and will return diagonal matrix

Fit results with BatchMode on:

                  nbkg    1.5000e+09 +/-  0.00e+00
                    p2    9.4867e-01 +/-  0.00e+00
                    p3    1.0173e+01 +/-  0.00e+00
                    p4    1.1945e+00 +/-  0.00e+00
                    p5    8.7862e-02 +/-  0.00e+00
                  chi2    0.03277961737166526

(It looks like the normalisation is not taken into account for RooAbsPdf.createChi2(datahist, RooFit.DataError(0))?

Fit results with BatchMode off:

                  nbkg    1.5193e+09 +/-  3.89e+04
                    p2    9.5105e-01 +/-  1.04e-02
                    p3    1.0173e+01 +/-  1.62e-03
                    p4    1.1945e+00 +/-  2.68e-04
                    p5    8.7849e-02 +/-  5.19e-05
                  chi2    0.5243252774369742

If you could let me know how you produced your good fit, that would be very helpful!

Cheers,
Falk

fbartels · June 7, 2022, 8:28am

Anyone? @jonas ?

jonas · June 7, 2022, 8:45am

Hi! I will try the comparison later today myself to confirm that this is the problem, but I think the fit might be affected by a bug that got fixed the upcoming 6.26.04 patch release. Sorry for the delay in this investigation.

Cheers,
Jonas

fbartels · June 7, 2022, 8:56am

OK, thank you!
Just wanted to make sure that this topic does not get closed from inactivity.

eguiraud · June 20, 2022, 5:06pm

Hi @fbartels ,

in the meanwhile note that if @jonas is right you can test whether 6.26.04 fixes the problem by updating to the latest version or by trying it out in a conda environment, a docker container or similar. See also Installing ROOT - ROOT .

Cheers,
Enrico

system · July 4, 2022, 5:06pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

jonas · September 10, 2022, 11:03pm

Hi @fbartels,

sorry for the late reply!

I Have produced the results with quickFit, using my custom branch that compiles with ROOT master. It is based on the ROOT_v6.24_06 in the official quickFit repo. Then I just followed the instructions from the README in your tarball, executing the quickfit command once of the floating parameter input file and once for the fixed input file.

The problem with the spurious parameter in the BatchMode that you observed was fixed recently in this commit. It was backported to 6.26.06, so if you use the most recent 6.26 patch release it should be all good.

I didn’t see a --useBatch 1 branch for quickFit though, do you use a different quickfit branch? In my branch based on the ROOT_v6.24_06 branch, the BatchMode(true) option is hardcoded in the createNLL call. So my results are indeed using the BatchMode and look fine, as far as I can tell.

I hope this helps you to continue your investigation, let me know if you need further help!

Jonas

fbartels · September 12, 2022, 11:23am

Hi @jonas ,

Thanks for the details! In your custom branch, do you use the setup_lxplus.sh that sources /afs/cern.ch/user/a/asciandr/public/ROOT_v6.24_06/build/bin/thisroot.sh (which does not exist anymore)? Or do you instead build the latest ROOT master and source that?

I’m using the development branch of the original quickFit repo, I think you are using a fork from that. There, the --useBatch 1 can be used to set BatchMode(true) which otherwise defaults to false.

Many thanks,
Falk

jonas · September 12, 2022, 1:45pm

Very interesting!

I used now the development branch of quickFit; my ROOT_master branch is now based on that. Indeed I can reproduce your results from your initial post now:

But when I use this other branch I had (ROOT_v6.24_06_master, based on this quickfit fork, I get almost exactly the results from my initial post:

Both times I use the same build of ROOT master. So there was something going on in the recent quickFit develompent that made the fit converge slightly worse. The fir results are very similar however. Here with the branch based on development:

  RooFitResult: minimized FCN value: -2.17343e+10, estimated distance to minimum: 0.00284159
                covariance matrix quality: Full matrix, but forced positive-definite
                Status : MINIMIZE=-1 HESSE=3 

    Floating Parameter    FinalValue +/-  Error   
  --------------------  --------------------------
                  nbkg    1.5193e+09 +/-  3.89e+04
                    p2    9.4966e-01 +/-  1.04e-02
                    p3    1.0173e+01 +/-  1.62e-03
                    p4    1.1945e+00 +/-  2.68e-04
                    p5    8.7861e-02 +/-  5.18e-05

Here with the older branch:

  RooFitResult: minimized FCN value: -2.17343e+10, estimated distance to minimum: 0.033769
                covariance matrix quality: Full matrix, but forced positive-definite
                Status : MINIMIZE=-1 HESSE=3 

    Floating Parameter    FinalValue +/-  Error   
  --------------------  --------------------------
                  nbkg    1.5193e+09 +/-  3.89e+04
                    p2    9.5105e-01 +/-  1.04e-02
                    p3    1.0173e+01 +/-  1.62e-03
                    p4    1.1945e+00 +/-  2.68e-04
                    p5    8.7849e-02 +/-  5.18e-05

Both times I used the BatchMode. Since the fit parameter results are so similar, could quickFit have changed the way the residuals are computed?

Maybe it’s better to keep quickfit out of the equation when investigating this model? You said you have some code to do this fit standalone, maybe you can share that if there are some further checks I can help with?

About the ROOT version I use: yes, I use ROOT master that I built myself, but you can also source it from the nightly builds.

Cheers,
Jonas

fbartels · September 13, 2022, 8:36am

Hi @jonas,

Right, with the latest ROOT master the BatchMode issue is gone. The relevant difference between the two quickFit repositories seems to be that in the atlas-phys-exotics-dijetisr one there is a default value set for _samplingRelTol so by default bins are integrated while in the standard quickFit they are not.

I can reproduce both repositories behaviour by setting --samplingRelTol 1e-3 (blue) or --samplingRelTol -1 (disabled, red):

Regarding keeping quickFit out of the equation, this is the small script I’ve been using to fit the workspace:

void minimizeNLL(){

  RooMsgService::instance().getStream(1).removeTopic(RooFit::NumIntegration) ;
  RooMsgService::instance().getStream(1).removeTopic(RooFit::Fitting) ;
  RooMsgService::instance().getStream(1).removeTopic(RooFit::Minimization) ;
  RooMsgService::instance().getStream(1).removeTopic(RooFit::InputArguments) ;
  RooMsgService::instance().getStream(1).removeTopic(RooFit::Eval) ;
  RooMsgService::instance().setGlobalKillBelow(RooFit::ERROR);

  TFile *tf = TFile::Open("floatingPars/dijetTLA_combWS_swift.root");

  RooWorkspace *w = (RooWorkspace *)tf->Get("combWS");
  RooStats::ModelConfig *mc = (RooStats::ModelConfig *)w->obj("ModelConfig");
  RooDataSet *data = (RooDataSet *)w->data("combData");
  RooArgSet *obs = (RooArgSet *)mc->GetObservables();
  RooDataHist* dh = new RooDataHist("dh", "dh", *obs, *data);

  RooAbsReal::defaultIntegratorConfig()->method1D().setLabel("RooAdaptiveGaussKronrodIntegrator1D");
  RooAbsPdf* pdf = mc->GetPdf();
  RooAbsReal* nll = pdf->createNLL(*data,
                                   RooFit::BatchMode(false),
                                   RooFit::Constrain(*mc->GetNuisanceParameters()),
                                   RooFit::IntegrateBins(-1),
                                   RooFit::GlobalObservables(*mc->GetGlobalObservables()));
  RooAbsReal* chi2 = pdf->createChi2(*dh, RooFit::DataError(0));

  double nll_init = nll->getVal();
  double chi2_init = chi2->getVal();

  w->var("nsig_mean1000_width7")->setConstant();
  
  RooMinimizer _minim(*nll);
  _minim.setStrategy(2);
  _minim.setPrintLevel(2);
  _minim.setEps(1e-10);
  _minim.setOffsetting(0);
  _minim.optimizeConst(2);
  _minim.setMinimizerType("Minuit2");
  _minim.minimize("Minuit2");

  double nll_final = nll->getVal();
  double chi2_final = chi2->getVal();

  std::cout << "initial NLL: " << std::setprecision(14) << nll_init << std::endl;
  std::cout << "final NLL: " << std::setprecision(14) << nll_final << std::endl;

  std::cout << "initial chi2: " << chi2_init << std::endl;
  std::cout << "final chi2: " << chi2_final << std::endl;

  TFile *tout = TFile::Open("floatingPars/dijetTLA_combWS_swift_afterFit.root","RECREATE");
  w->Write("combWS");
}

The result can be plotted with the ExtractPostfitFromWS.py as well. It seems to perfectly reproduce the red fit from above. Unfortunately I do not manage to get it succeed with IntegrateBins enabled.

Cheers,
Falk

fbartels · September 26, 2022, 11:20am

Just a bump to keep this from closing.

I didn’t manage to perform further tests yet, but I suspect using the integration does not in general produce more accurate results and this might just be random chance. I did check however that this behaviour seems to be driven by the highest-statistics bins at the low end of the spectrum. Restricting the fit to e.g. 10 bins there yields even stronger deviations than in the full range.

Cheers,
Falk

Iain_Bertram1 · November 16, 2022, 5:56am

HI ALL,

I have been investigating this issue in parallel. I suspect there is a real machine accuracy issue with the size of the statistics and the definition of the ML. With the FCN being on the order of 2e10 this is getting very large.

I have been running a simple Minuit fit and reproduce the problem outside of RooFit using a NLL definition of the usual extended type

Sum (expected - observed log(expected)).

If I replace this with a more sophisticated NLL then I get good convergence and a FCN that is approximately equal to the number of degrees of freedom which is much more numerically stable:

Sum (expected - observed + observed ( log(observed) - log(expected))

I have been trying to work out where I can modify this in RooFit but the code is too complex for me to figure out where the likelihood is defined.

I am using 6.24.06 as 6.26 broke my code.

jonas · November 22, 2022, 4:57pm

Hi @Iain_Bertram1,

interesting observation, thanks for reporting this!

So you’re saying that dropping the constant terms in the likelihood makes the fit not converge anymore?

If it’s an accuracy issue, you can also try out the Offset() option mentioned in the RooAbsPdf::createNLL or RooAbsPdf::fitTo methods. It subtracts the initial likelihood value from all subsequent likelihood evaluations, making the likelihood values closer to zero and hence more numerically stable.

By the way, it is not good that 6.26 broke your code. If this is not related to some intended deprecation, this could be a bug. If you have time, please feel free to open a new forum post about this, or even a GitHub issue if you think it’s a bug!

Cheers,
Jonas

Iain_Bertram1 · November 29, 2022, 7:41am

I will attempt again using the OffSet() option but have not had much success with its previously as it depends on the initial values of the fit parameters.

My experience with fitting large statistics binned data is that it really does require using a NLL that contains the constants.

I will re download the latest root and test with my code. Old RooFit based programs had failed previously and I will check that behaviour.

fbartels · November 29, 2022, 9:06am

Hi @jonas

I had tried the Offset() option before as well. And even with optimal starting parameters it rather made the fit worse instead of better (see light blue vs green or yellow vs red curve in the very first post of this thread.

Cheers,
Falk

jonas · December 6, 2022, 4:23pm

Hi @fbartels and @Iain_Bertram1,

I had more time to digest your answers now, and I think I understand now what is going on. The key is Iains definiton of the likelihood above:

Sum (expected - observed + observed ( log(observed) - log(expected))

Basically, instead of calculating the NLL, which can grow really large and unstable, he evaluates the likelihood ratio compared to a template PDF created from the dataset. This ratio is much closer to zero. And by taking the ratio for every bin before summing it, you have a value close to zero for each bin and avoid numerical instabilities in the sum because you might combine bins with a vastly different number of events. This is probably exactly what happens in @fbartels fit, because it’s an exponentially decaying distribution.

Now I also understand why the “Offset” feature doesn’t help, because the offset is applied only after summing over the bins.

This trick needs to be implemented in the RooFit likelihood classes I think. It makes huge sense! I would certainly like to have in the next ROOT release. Maybe we could have a new option for createNLL, called RatioWithTemplate (or TemplateRatio or something less cryptic), that enables this likelihood definition. We can’t make it the default because the different absolute NLL values can catch users by surprise. I would implement it, commit it to ROOT master, and then you can try it out, either by sourcing the nighlies from CVMFS or building yourself. For the next ROOT release that is due in January, this new feature would be included.

What do you think? You think it’s worth a shot?

Cheers,
Jonas

Iain_Bertram1 · December 7, 2022, 6:52am

@jonas

I am not sure where this version of the NLL derives from. I have been using it for about 20 years and it was old then so not particularly new…

One set of slides describing it is here: http://graduierten-kurse.physi.uni-heidelberg.de/WiSe2003/lectures/blobel/export/mxli/master.pdf (p33 to 37)

Or https://indico.cern.ch/category/6015/attachments/192/631/Statistics_Fitting_II.pdf pages 90 - 96.

I am afraid I cannot recall the source of the likelihood (other than it may have been developed due to lower precision computations pre 32/64 bit machines).