Minuit Worsens Fit?

jfcaron · September 4, 2014, 11:12pm

Hi, I have been struggling with a simple two-parameter non-linear fit for a while. I have been able to determine that Minuit’s “Combined” algorithm often performs worse than a simple 100x100 point grid search, and in some cases returns fit parameters that have a WORSE chi-squared than with the initial fit guesses.

Unfortunately the fitting code is embedded in a class and I can’t easily show a reproducer. The class code can be seen here: https://bazaar.launchpad.net/~jfcaron/+junk/Proto2BeamTest2/view/head:/JFTrackFit.C (and a corresponding .h file). The class itself is a functor whose operator() returns the chi-squared value using the Track_Params member variables as fit parameters. Operator() can be seen on line 101 (or click here: https://bazaar.launchpad.net/~jfcaron/+junk/Proto2BeamTest2/view/head:/JFTrackFit.C#L101).

The minimization occurs on line 491 (or click here: https://bazaar.launchpad.net/~jfcaron/+junk/Proto2BeamTest2/view/head:/JFTrackFit.C#L491) and is basically copy-and-pasted from the ROOT multidimensional fitting instruction page here: http://root.cern.ch/drupal/content/numerical-minimization#multidim_minim

The class also has a method for drawing a big TH2D “heat map” of the chi-squared value returned at various fit parameters. I put markers on the map at the location of the initial fit parameters that I give to the minimizer and a marker for where Minuit claims the minimum to be. Since I’m evaluating the function 100x100 times anyways for the TH2D, I also keep track of the best point on that grid, and put a marker there. In many cases, the grid search actually found a better minimum than Minuit (see file fitmap_017.pdf attached), and in some rarer cases, the minimum returned by Minuit is actually WORSE than the initial fit parameters (see file fitmap_006.pdf)!

I don’t expect anyone to read my code in detail, as it’s probably a mess. I’m just hoping that someone can maybe point out if I am misusing Minuit in my JFTrackFit::Minimize method, or tell me how possibly Minuit can perform so badly at minimizing a function with a single deep minimum in 2D. Maybe from then I can locate where is my mistake, as I don’t really believe this can be a bug in Minuit itself.

Jean-François
fitmap_006.pdf (47 KB)
fitmap_017.pdf (47 KB)

Wile_E_Coyote · September 5, 2014, 7:09am

At first quick look, I haven’t noticed anything obvious. I often use something like:

minimizer->SetMaxFunctionCalls(100000); // used by Minuit and Minuit2
minimizer->SetMaxIterations(100000);  // used by GSL*
minimizer->SetTolerance(0.000001); // EDM
minimizer->SetPrintLevel(1);

and sometimes I increase the “print level” in order to see what’s really happening.
Make sure that your “chi2” function is continuous and “smooth” and that you don’t get any “NaN” (i.e. you have no “jumps” in it, otherwise Minuit[2] may get angry). Also, make sure that the returned “chi2” does not “change in time” (i.e. each time you calculate it with a specific set of parameters, it returns exactly the same value).

jfcaron · September 5, 2014, 8:46pm

When I set the printlevel to 1, I get that most of the bad fits are status 0: “Valid minimum”.

In my functor operator()() I have an assert(TMath::Finite(S2)); return S2; which never fires, so it’s never returning NaNs. However one reason why it never returns NaNs is because in calculating the return value, I have a few if blocks to check whether some numerical corrections are Finite, so maybe those are creating “edges”, so that while operator()() never returns NaN, maybe the derivative between two nearby points is infinite…

I’m pretty sure that my functor’s values don’t change between the minimization and the reading of the best fit value, since the FVAL printed with PrintLevel 1 matches the one drawn on the fitmap_ plots.

Thanks for taking a look, I know it’s not pleasant looking at other people’s nasty code.

Jean-François

Wile_E_Coyote · September 5, 2014, 9:14pm

Can it be that your “chi2” is wavy / hilly? So that there are many “local minima” (which you don’t clearly see in your “fitmap” which seems to have a single “global minimum”). Maybe you could try to have a look at a detailed “fitmap” just around the “false” minimum found by Minuit2?
Well, you could also try to “SetTolerance” to something like 1 (or even 10) and test if in this case Minuit2 will jump over “false” minima (your “chi2” values are quite big, of the order of hundreds to thousands, so requesting “EDM” = 1 should not create any real problem).

jfcaron · September 9, 2014, 4:30pm

I tried setting the EDM to 1, 10, and 100, but the fits actually remain the same. I thought the EDM was normalized anyways, so the overall scale of my function shoudn’t matter?

I also made higher-resolution heat maps, using 1000x1000 points instead of 100x100. While there are definitely some local minima around, they are extremely shallow compared to the global minimum.

Maybe part of the problem is that I am plotting with a log scale for the Z axis, but I am using the non-log scale for the fitting, so maybe the local minima are actually not so shallow according to Minuit. I will look into making my functor::operator()() return the log of the chi-squared value.

Jean-François

jfcaron · September 9, 2014, 6:04pm

I have remembered some old advice about the color palette used in heat maps like this, and lo: the non-rainbow palette actually revealed some features that were previously hidden.

The attached plots are using SetTolerance->(10), and the printed output from Minuit in each case is:

Entry 6:
Minuit2Minimizer: Minimize with max-calls 100000 convergence for edm < 10 strategy 1
Minuit2Minimizer : Valid minimum - status = 0
FVAL  = 4219.83473908628184
Edm   = 0.00338604594315522305
Nfcn  = 50
x0	  = -0.0979623	 +/-  0.00231916
theta	  = 1.78085	 +/-  0.000580208

Entry 17:
inuit2Minimizer: Minimize with max-calls 100000 convergence for edm < 10 strategy 1
Minuit2Minimizer : Valid minimum - status = 0
FVAL  = 4709.38069231348254
Edm   = 4.62286151272740701e-09
Nfcn  = 27
x0	  = -0.341287	 +/-  0.00166187
theta	  = 1.95419	 +/-  0.000535298

What is suspicious to me is that in both cases the status is 0, but the number of function calls/iterations is way less than my limit, 50 and 27 respectively. Which of the settings is telling Minuit that it is finished? I’m only setting the variable step sizes, MaxFunctionCalls, MaxIterations, and Tolerance values. Are there other knobs I could turn to tell Minuit to keep trying?

Jean-François
fitmap_017.pdf (142 KB)
fitmap_006.pdf (143 KB)

Wile_E_Coyote · September 10, 2014, 7:26am

Looking at your new plots, I think I can clearly see several local minima and in both cases Minutis2 finds one of them (this is a perfectly “legitimate” behavior).
I’m afraid that in this case, you would need a two-step procedure:

run the “scan” -> it will hopefully find the “right” minimum,
run the “combined” fit using the values found in the step 1. as initial parameters (and hope it will not jump to another local minimum, or you will need to set “limits” on your variables based on the “step sizes” used in the “scan” procedure).

jfcaron · September 11, 2014, 8:52pm

Yes, this experience would be a great argument for setting the default ROOT colour palette to something other than rainbow as suggested in those posts. I would have seen this much earlier if the default palette was more reasonable.

Is there a built-in function minimizer that gracefully deals with multiple minima? I could do my own thing with multiple start-points, or as you suggest, a grid-search for the initial points, but it seems easy to get wrong.

By the way, I have reduced the complexity of my functor::operator() method to reduce the effect of the edges, but I still have multiple minima, typically 1-4 of them. Some new plots are attached to illustrate.

Thanks for the help so far,
Jean-François
fitmap_001.pdf (66.5 KB)
fitmap_017.pdf (46.6 KB)

jfcaron · September 12, 2014, 11:46pm

I have been able to modify my functor’s Minimize method to mostly find the true minimum. My technique relies on the fact that my minima are mostly ~0.7 apart in the x0 direction and ~0.1 in the theta direction as a consequence of our detector geometry.

After minimizing once, if the function value at the found minimum is above a certain threshold, I try minimizing again starting at the four points (+/- x0_step, +/- theta_step) away from the original minimum. In the overwhelming majority of my cases, this is enough to end up at the true minimum (or at least better than what the grid search found).

If anyone is curious they can see my code around here: https://bazaar.launchpad.net/~jfcaron/+junk/Proto2BeamTest2/view/273/JFTrackFit.C#L556

Jean-François

beniz · September 15, 2014, 8:35am

Hi Jean-Francois, we are a small team at Inria / Paris-Sud / Saclay-LaL testing CMA-ES in ROOT, a well-known stochastic blackbox optimizer. Our current tests on challenging functions provided by physicists show improved results and we are always looking for difficult functions that we can use as examples, and that help us pinpoint where and when CMA-ES is most appropriate.

Our code is here, github.com/beniz/root/tree/cmaes4root_master
and instructions here: github.com/beniz/libcmaes/wiki/ … N%27s-ROOT

But if, as most users, you do not wish to recompile ROOT, please get in touch with me and we may add your function / use case to our batch.

Thanks,

Em.