The factor of 2 in the tolerance definition is caused by a different definition of the edm between Minuit1 and Minuit2.
Based on what I understood of your reply, the factor of two is added to the calculation of EDM in Minuit2?
The attached figures shows a very large difference, a small difference can happen, but of the order smaller than ~0.006.
It could be because our parameters are highly correlated. Some of the parameters have global correlation above 1.0 possibly due to numerical issues. I guess MINUIT’s assumption that the negative log likelihood is a quadratic function around the minima probably breaks down in our fits. Physics-wise, it is the best model we can come up with but I guess statistics-wise there too many free parameters.
Now concerning your questions, I am not sure which fix are you referring to, can you please post the links to the posts ?
You mentioned “a couple of fixes” in 4. of this post and “due to some issues which have been fixed only in the new version” in this post as well. In fact, you mentioned “some fixes applied in Minuit2” in your reply above as well. I was wondering what these fixes were specifically and in technical terms since the details were never given in ROOT forums and Minuit2 documentation.
Otherwise I could have a look at the obtained results obtained with the maximum verbosity mode.
I can ask my collaborators if they are comfortable sharing the logs with you privately. We do plan the publicly release the fitting code after we publish the analysis (earliest in 6 months but can take up to a year). If you don’t mind waiting so long I will add this to my to-do when the analysis and code is public.
if possible, to compute a compensated summation and if possible keep the total likelihood value quite small (not too large) by using also an overall offset.
We calculate the per-event likelihood using CUDA and sum them using reduction operation with CUDA boost which I assume does not use any compensated summation. Regarding total likelihood, we usually end up with a value around -180,000. I’ve not heard of any amplitude fits using offsets in their minimization so I either have to ask around or think of how to implement this without compromising the mathematical result.
initial Hessian matrix is estimated as a diagonal matrix computed using the diagonal second derivatives only
Thanks for answering this! Maybe the scipy BFGS method later multiplies the identity matrix with first/second derivatives in some other part of the code that I did not read.