We have implement the same code in two different machine but we don’t get the same answers. Usually the portable converges faster and gives correct error bars, while the server takes much longer and sometimes even does not converge (edm<0.1 is standard for our application). Here I give an extract of the minimization.
Hi,
Looking at the log I see a small difference in the computed function value FCN, already at the beginning, you have:
Initial state - FCN = -525833.3549924 on the portable
Initial state: - FCN = -525828.7318209 on the server
A small difference can cause Minuit taking a different path and make the fit converge in one case and in another not, especially if the fit is not very stable, or taking much longer to converge.
The difference is caused by your non-portable implementation of the FCN between the local and the server machine.
I would investigate this and possibly try to improve the numerical accuracy as much as you can of your FCN. For example in RooFit we use the compensated Kahan summation for computing the likelihood function.
Well I don’t know what you are talking about. Non-portable functions? You mean in the encoded language code of the machines? Different implementations depending on the processors?
Well I don’t know what you are talking about. Non-portable functions? You mean in the encoded language code of the machines? Different implementations depending on the processors?
I am meaning that functions like exp or log could return a different result, especially if you are maybe using instead of libm a different Mathematical library, e.g. the VC library.
Another possibility is that one of your machine is a 32 bits and the other is a 64 bits. This could also give different numerical results. (see for example c++ - 64 bit floating point porting issues - Stack Overflow)
I guess these machines use different compilers: gcc --version
Then the default optimizations will be different (and some internal functions, too).
You could try to compile your code using “-O0”.
However, I also think you should make your procedures more “robust” (e.g., if you calculate your own chi^2 and there is a “sum”, try to use the Kahan–Babushka-Neumaier summation algorithm).
You could try to compile it using “-O2 -Wall -Wextra” and closely inspect reported problems (note: usually newer compiler will spot more, and “-O2” or “-O3” is needed here).
BTW. Another thing to check … can it be that you somewhere use “float” instead of “double”? You could also attach your source code for “inspection”.
(working with ThierryA)
Thanks, -Wextra allowed me to remove unused stuff, I already had -Wall
-O0, -g, -O2, -O3 gives the exact same result, if on the same computer.
No I did not use float instead of double.
I will try remove everything and clone our git again on both, it is getting crazy.
I need to make some cleaning before attaching the source.
Minuit is taking parameters from another peace of code, which is not stable either.
Hello,
Arg, this is not the case, I downloaded isolated minuit2, Minuit2-5.34.14.tar.gz, and compiled it locally on both computers, could not find standalone minuit 2 more recent, for both debian and fedora.
Any advice ?
That’s a question to @Axel and @moneta … I do not know if any fixes are applied to the standalone Minuit2 (but I guess you should be able to get the latest ROOT on both systems).
Hi,
That tar file is a quite old version. You can take a new version of Minuit2 directly from the ROOT github repository and build it standalone with cmake. See
If you have any issue building it please let me know
Thans Lorenzo and Wild_E_Coyote. We are still investigating.
We also encountered a strange behaviour with edm.
We had the convergence stopping while the criterion was not met, how come? (See below)
Are you still using the old version 5.34 or the new one ?
This is strange, but the logic for the end is rather complex, it is also corrected using some information computed from the covariance matrix. I would need the full print-out, possibly with a more verbose option, to understand exactly what is happening.