It is said that Minos errors are recomended over Migrad’s. However, Minos tends to be slower, and I get the same result as using Migrad (the functions always converge in ~< 100 calls)

Out of curiosity, what are the pathological cases where Migrad can give different results than Minos?

After a MIGRAD or HESSE step, the errors are usually quite accurate, unless there has been a problem […]
-Warning messages produced during the minimization or error analysis.
-Failure to find new minimum.– Value of EDM too big. For a “normal” minimization, after MIGRAD, the value of EDM is usually morethan three orders of magnitude smaller than UP(the SET ERRordef), unless a looser tolerance has been specified.
-Correlation coefficients exactly equal to zero, unless some parameters are known to be uncorrelatedwith the others.
-Correlation coefficients very close to one (greater than 0.99). This indicates both an exceptionally difficult problem, and one which has been badly parametrized so that individual errors are not very meaningful because they are so highly correlated.
-Parameter at limit. This condition, signalled by a Minuit warning message, may make both the function minimum and parameter errors unreliable. See section 5.3.2,Getting the right parameter errors with limits

In practice, MINOS errors usually turn out to be close to, or somewhat larger than errors derived from the error matrix, although in cases of very bad behaviour (very little data or ill-posed model) anything can happen. In particular, it is often not true in MINOS that two-standard-deviation errors (UP=4) and three-standard-deviation errors (UP=9) are respectively two and three times as big as one-standard-deviation errors, as is true by definition for errors derived from the error matrix (MIGRAD or HESSE).

In principle both of them should give the same result. It could be the difference is due to some defaults initial step sizes or tolerance of the minimisation.

Are there any other situation where Migrad can be fooled and produce errors different from Minos?

Thanks for the paper, I found it almost simultaneously at the PSI webpage

I have observed something that is mentioned in the paper: when there is little data, Minos errors are a bit larger (and slightly asymmetric ) than Migrad’s (Likelihood was used).

In practice,MINOS errors usually turn out to be close to, or somewhat larger than errors derived from the error matrix, although in cases of very bad behaviour (very little data or ill posed model) anything can happen.

Probably I lack of the enough knowledge of Statistics, but I do not see the connection between few-data and differences in errors (Migrad’s errors vs Minos’ errors). Why few data causes the difference?

It is said in the paper,

In 1.3.2, […] For a linear problem, this contour line would be an exact ellipse
In 1.2.4, […] The diference between these three numbers [ Minos vs Hesse vs Migrad ] is one measure of the non linearity of the problem (or rather of its formulation)

Does “problem” mean the estimator (or its derivatives) dependency on the parameters?

First of all Migrad error without Hesse are not reliable, because an approximate second derivative is used and they should not be used for reporting parameter errors.
Migrad errors after Hesse, which are typically called Hesse errors because they are obtained from the inverse of the Hessian matrix, are an asymptotic approximation that is valid when N goes to infinity. One can proof that the negative log-likelihood function is asymptotically (for large N ) a parabola, because the likelihood function becomes a Gaussian function for large N. This comes from the central limit theorem and you can find proofs in many statistical books.
The interesting thing is that often N does not need to be very large, already for a rather small N the negative log-likelihood is already a parabola.

When the log-likelihood function is a parabola, by definition the errors obtained with Minos are exactly the same obtained with the Hessian method.

Problem here refers to the minimisation or fitting problem. In case of linear least square fit the function to minimise is exactly a parabola independently of the number of data points. So the linear problem here is considered the one where the function to minimise is a parabola