Stooping criterion of Minuit.migrad when EDM is larger than tolerance

victor_estrade · June 28, 2021, 2:23pm

Hi all,
I am using iminuit to minimize a negative log likelihood on a toy problem.
I have several question about how to be sure that I am using migrad() correctly.

migrad() outputs explains that the estimated distance to minimum (EDM) is too large in the end, leading to an invalid minimum.
here is the output of migrad :

┌──────────────────────────────────┬──────────────────────────────────────┐
│ FCN = -7532                      │              Nfcn = 84               │
│ EDM = 2.23 (Goal: 0.0001)        │                                      │
├───────────────┬──────────────────┼──────────────────────────────────────┤
│INVALID Minimum│ Valid Parameters │        No Parameters at limit        │
├───────────────┴──────────────────┼──────────────────────────────────────┤
│ ABOVE EDM threshold (goal x 10)  │           Below call limit           │
├───────────────┬──────────────────┼───────────┬─────────────┬────────────┤
│  Covariance   │     Hesse ok     │ Accurate  │  Pos. def.  │ Not forced │
└───────────────┴──────────────────┴───────────┴─────────────┴────────────┘
┌───┬─────────┬───────────┬───────────┬────────────┬────────────┬─────────┬─────────┬───────┐
│   │ Name    │   Value   │ Hesse Err │ Minos Err- │ Minos Err+ │ Limit-  │ Limit+  │ Fixed │
├───┼─────────┼───────────┼───────────┼────────────┼────────────┼─────────┼─────────┼───────┤
│ 0 │ rescale │   0.809   │   0.004   │            │            │  0.001  │         │       │
│ 1 │ mu      │   0.514   │   0.028   │            │            │  0.001  │         │       │
└───┴─────────┴───────────┴───────────┴────────────┴────────────┴─────────┴─────────┴───────┘
┌─────────┬─────────────────────┐
│         │   rescale        mu │
├─────────┼─────────────────────┤
│ rescale │  1.69e-05 -2.04e-05 │
│      mu │ -2.04e-05  0.000768 │
└─────────┴─────────────────────┘

What puzzles me is that the found parameters are very close to the minimum it should find.
So I guess that the minimization reached the minimum but the EDM near the minimum is simply too large.

I join here a contour plot of the negative log likelihood landscape.

According to what I read simply increasing the tolerance should fix it.
But I did not found prescription on how the new tolerance value should be chosen.
If it is too large the migrad() algorithm may stop before reaching the minimum, right ?
So how should I set the tolerance ?

Before this issue I also faced one major difficulty which may be related : due to use of GPU in my process I am limited to float32 precision.
The negative log likelihood (NLL) remains constant if the input of the function differs only by 1e-7.
1e-6 change in the input parameters (rescale or mu) does change the NLL.

According to the documentation setting the the precision parameter of Minuit to 1e-6 should fix it.

Moreover I set a limit on my parameters to 0.001 because they should be strictly positive.

Auxiliary question : what are the stopping criterion of migrad ?
If the EDM is lower than the tolerance then it stops.
But what are the other criterions ? In my case it stops before reaching a small enough EDM.

I remember reading something in a 1970s paper about if the gradient of the next step is larger than the one of the current step then something happens to fix it.
But I did not understood the details.

I read that Migrad is an implementation of BFGS.
But courses and other tutorials I found about this algorithm stick with one stopping criteria (norm of the gradient lower that predefined tolerance value).

Thanks for reading !

ROOT Version: ROOT-v6-25-01-973-ga470d483db
Platform: Ubuntu
Compiler: Not Provided

victor_estrade · June 28, 2021, 2:37pm

Note :
The EDM formula is : G^T . Cov . G

with G = the gradient
G^T = the transposed gradient
Cov = the covariance matrix (The inverse of the Hessian)

souce : Minuit - a system for function minimization and analysis of the parameter errors and correlations page 8

So if the minimized function f() is replaced by a similar one : h(a) = 10 000 * f(a)
Then the gradient G_h = 10 000 * G_f
and approximately Cov_h = (1 / 10 000) * Cov_f
Leading to : EDM_h = 10 000 * G_f^T . [ (1 / 10 000) * Cov_f ] . 10 000 G_f = 10 000 EDM_f

How is migrad handling this possibility ?
Or how should I handle it ?

Or maybe I made a mistake somewhere.

victor_estrade · June 28, 2021, 2:42pm

Another Note : In the contour plot of the NLL the y-axis is labeled “alpha” instead of “rescale”

bellenot · June 28, 2021, 2:43pm

Welcome to the ROOT Forum!
@moneta can most probably help on that topic

moneta · June 28, 2021, 4:17pm

Hi,

The stopping criteria with the EDM is normalised using the Up value, the one which defines the error level, which is equal to 1 when minimising a chi2. If you re-scale your chi2 function by 10000 you would need to rescale then that value.

Now if you are getting a larger EDM, I would not increase the tolerance, but understand why the Minimisation stopped. Normally you have a message telling the reason, I am not sure if this is suppressed in iminuit, but you should be able to increase the print level.

Lorenzo

victor_estrade · June 28, 2021, 4:51pm

Thanks for the fast and clear replies !

I increased the print level.
Indeed it gives some insights.

The program complains about Machine accuracy limits further improvement.

Since some of the computation is done in float 32 (using GPU) the NLL function is not sensitive to change of the input parameters below 1e-6.
The documentation reads that :

If the user fools Minuit by using a double precision version but making internal FCN or FUTIL computations in single precision, Minuit will interpret roundoff noise as significant and will usually either fail to find a minimum, or give incorrect values for the parameter errors

This is something I have observed with my attempts. If I leave the default precision then often the minimization stays stuck near the starting point.
I set the precision to 1e-6. I believe it is equivalent to the EPS setting in the Fortran version or the MnMachinePrecision::setPrecision(double) in the C++ version.

I may be able to get rid of the float32 computation part with some tricks on my toy problem.
But on the real application I am aiming at it will be more difficult.

with print level = 1 these new information shows up

W VariableMetricBuilder No improvement in line search
W VariableMetricBuilder Machine accuracy limits further improvement
W VariableMetricBuilder No improvement in line search
W VariableMetricBuilder Iterations finish without convergence; Edm 17.2315 Requested 0.0001
W VariableMetricBuilder FunctionMinimum is invalid after second try
W VariableMetricBuilder No improvement in line search
W VariableMetricBuilder Machine accuracy limits further improvement
W VariableMetricBuilder No convergence; Edm 2.23499 is above tolerance 0.001
W VariableMetricBuilder No improvement in line search
W VariableMetricBuilder Machine accuracy limits further improvement
W VariableMetricBuilder No convergence; Edm 2.23499 is above tolerance 0.001
W VariableMetricBuilder No improvement in line search
W VariableMetricBuilder Machine accuracy limits further improvement
W VariableMetricBuilder No convergence; Edm 2.23499 is above tolerance 0.001
W VariableMetricBuilder No improvement in line search
W VariableMetricBuilder Machine accuracy limits further improvement
W VariableMetricBuilder No convergence; Edm 2.23499 is above tolerance 0.001

with print level = 2 it gives more numbers :

I MnSeedGenerator Initial state: FCN =      -7517.325237 Edm =       11.59506038 NCalls =      7
I VariableMetricBuilder Start iterating until Edm is < 0.0001 with call limit = 420
I VariableMetricBuilder    0 - FCN =      -7517.325237 Edm =       11.59506038 NCalls =      7
I VariableMetricBuilder    1 - FCN =      -7526.161336 Edm =       1.814992182 NCalls =     13
W VariableMetricBuilder No improvement in line search
I VariableMetricBuilder    2 - FCN =      -7526.161336 Edm =       1.814992182 NCalls =     14
W VariableMetricBuilder Machine accuracy limits further improvement
I VariableMetricBuilder After Hessian
I VariableMetricBuilder    3 - FCN =      -7526.161336 Edm =       4.390902625 NCalls =     28
W VariableMetricBuilder Reached machine accuracy limit; Edm 4.3909 is smaller than machine limit 47.5996 while 0.0001 was requested
W VariableMetricBuilder No convergence; Edm 4.3909 is above tolerance 0.001
I MnSeedGenerator Initial state: FCN =      -7526.161336 Edm =       2.668775737 NCalls =      7
I VariableMetricBuilder Start iterating until Edm is < 0.0001 with call limit = 420
I VariableMetricBuilder    0 - FCN =      -7526.161336 Edm =       2.668775737 NCalls =      7
W VariableMetricBuilder No improvement in line search
I VariableMetricBuilder    1 - FCN =      -7526.161336 Edm =       2.668775737 NCalls =      8
W VariableMetricBuilder Machine accuracy limits further improvement
W VariableMetricBuilder No convergence; Edm 2.66878 is above tolerance 0.001
I MnSeedGenerator Initial state: FCN =      -7526.161336 Edm =       2.668775737 NCalls =      7
I VariableMetricBuilder Start iterating until Edm is < 0.0001 with call limit = 420
I VariableMetricBuilder    0 - FCN =      -7526.161336 Edm =       2.668775737 NCalls =      7
W VariableMetricBuilder No improvement in line search
I VariableMetricBuilder    1 - FCN =      -7526.161336 Edm =       2.668775737 NCalls =      8
W VariableMetricBuilder Machine accuracy limits further improvement
W VariableMetricBuilder No convergence; Edm 2.66878 is above tolerance 0.001
I MnSeedGenerator Initial state: FCN =      -7526.161336 Edm =       2.668775737 NCalls =      7
I VariableMetricBuilder Start iterating until Edm is < 0.0001 with call limit = 420
I VariableMetricBuilder    0 - FCN =      -7526.161336 Edm =       2.668775737 NCalls =      7
W VariableMetricBuilder No improvement in line search
I VariableMetricBuilder    1 - FCN =      -7526.161336 Edm =       2.668775737 NCalls =      8
W VariableMetricBuilder Machine accuracy limits further improvement
W VariableMetricBuilder No convergence; Edm 2.66878 is above tolerance 0.001
I MnSeedGenerator Initial state: FCN =      -7526.161336 Edm =       2.668775737 NCalls =      7
I VariableMetricBuilder Start iterating until Edm is < 0.0001 with call limit = 420
I VariableMetricBuilder    0 - FCN =      -7526.161336 Edm =       2.668775737 NCalls =      7
W VariableMetricBuilder No improvement in line search
I VariableMetricBuilder    1 - FCN =      -7526.161336 Edm =       2.668775737 NCalls =      8
W VariableMetricBuilder Machine accuracy limits further improvement
W VariableMetricBuilder No convergence; Edm 2.66878 is above tolerance 0.001

victor_estrade · June 28, 2021, 5:12pm

These new information lead me to this old topic on the forum :

It seems that my situation is similar.
The function I try to minimize is the negative log likelihood of a 10 bin Poisson counting :
\begin{equation}
L(\mu, rescale) = \prod_i=1^10 Poisson(n_i | \mu_i * s_i(rescale) + b_i(rescale) )
\end{equation}

Re-scaling the NLL by a factor beta may help the numerical minimization.
Then I guess that I should compensate the computed error by a factor sqrt(beta) ?
Similarly to what is written in section 1.4.1 of the documentation.

Or is it better to re-scale the UP parameter ? (which I believe is the same as ErrorDef)

moneta · June 29, 2021, 8:30am

Hi,

As I was imaging the fit stopped due to a limited precision in computing the likelihood. Normally all function provided ago Minuit should be in double precision.
There is a parameter you can provide to Minuit, the precision (it is different than the tolerance). By default it is double precision ( ~ 2.2 E-16), you should then pass the float value (1.2 E-7).
However, since Minuit is designed for double precision, you might have some other issues, and it is possible that still the minimisation will not work correctly.
Rescaling the likelihood will not be help helpful, I would keep the UP (ErrorDef) value around 1.
One think that could help is re-defining (i.e. rescaling) the minimisation parameter to have the same scale at around 1. That will help the computation of the Hessian, by reducing its condition number and therefore the precision in computing the EDM.

Lorenzo

victor_estrade · June 29, 2021, 4:58pm

If I understand correctly the idea of rescaling the minimization parameter is to minimize a function

f(k * a, h * b)

instead of

f(a, b)

choosing k and h carefully to make the numbers in the Hessian closer to 1.

I tried it and also set the precision value to 1e-7.
The values inside the Hessian are now closer to 1.
The EDM is also lower (0.033 instead of 2.23).

Here is the new output

┌──────────────────────────────────┬──────────────────────────────────────┐
│ FCN = -7527                      │              Nfcn = 483              │
│ EDM = 0.033 (Goal: 0.0001)       │                                      │
├───────────────┬──────────────────┼──────────────────────────────────────┤
│INVALID Minimum│ Valid Parameters │        No Parameters at limit        │
├───────────────┴──────────────────┼──────────────────────────────────────┤
│ ABOVE EDM threshold (goal x 10)  │           Below call limit           │
├───────────────┬──────────────────┼───────────┬─────────────┬────────────┤
│  Covariance   │     Hesse ok     │ Accurate  │  Pos. def.  │ Not forced │
└───────────────┴──────────────────┴───────────┴─────────────┴────────────┘
┌───┬─────────┬───────────┬───────────┬────────────┬────────────┬─────────┬─────────┬───────┐
│   │ Name    │   Value   │ Hesse Err │ Minos Err- │ Minos Err+ │ Limit-  │ Limit+  │ Fixed │
├───┼─────────┼───────────┼───────────┼────────────┼────────────┼─────────┼─────────┼───────┤
│ 0 │ rescale │   789.9   │    1.3    │            │            │  0.001  │         │       │
│ 1 │ mu      │   48.8    │    2.7    │            │            │  0.001  │         │       │
└───┴─────────┴───────────┴───────────┴────────────┴────────────┴─────────┴─────────┴───────┘
┌─────────┬─────────────────┐
│         │ rescale      mu │
├─────────┼─────────────────┤
│ rescale │    1.75  -0.164 │
│      mu │  -0.164    7.24 │
└─────────┴─────────────────┘

I tried it for different values of the “true” parameter, number of data point in the simulation and the EDM is always lower using this re-parametrisation.

But unfortunately it is still higher than the tolerance.

victor_estrade · June 29, 2021, 5:08pm

What seems strange to me is that the found values seems to be close to the minimum.

Some context.
The objective of this experiment is to compare 2 dimension reduction methods (A and B).
The dimension reduction occur between the data generation and the histogram (which then leads to the binned Poisson likelihood)
The idea is find out if method A yields smaller error than B.
I do not need a perfect estimators or perfect error but only be able to compare the ones obtained with method A and method B.

Is there a way to know if the minimum and the error computed are completely wrong or only imprecise (2 or 3 significant digits) ?

The other option I have is to force the GPU to work with float64.

moneta · June 30, 2021, 8:57am

Are you still getting this error ?

VariableMetricBuilder Machine accuracy limits further improvement

If this is the case there is no much you can do apart from increasing the precision or being sure that you minimize the numerical error when computing the objective function. For example, when summing data points you should use a compensated summation, as the Kahan summation algorithm.
Otherwise you might need to increase also the tolerance.
I think if the Hessian returns a good status, it is positive defined, you can be confident that the minimum is correct. Since you have only two parameters, you might try also to scan around the minimum to see if you find a lower value.

I would however switch to float64 if you can. In our implementation of fitting on GPU that we are currently developing within ROOT we will use double precision.

Lorenzo

victor_estrade · June 30, 2021, 9:25am

Yes I still have the error

VariableMetricBuilder Machine accuracy limits further improvement

I will search my code for all the parts where the computations are done in single precision and try to convert them to double precision

Thank you