Difference of convergence for Minuit2 between a portable and a server

tappourc · February 10, 2022, 7:57am

Hi guys,

We have implement the same code in two different machine but we don’t get the same answers. Usually the portable converges faster and gives correct error bars, while the server takes much longer and sometimes even does not converge (edm<0.1 is standard for our application). Here I give an extract of the minimization.

Does anyone have an idea?

Thanks,

ThierryA

**Portable (converged)**
Global fit minimizer: Minuit2
MnSeedGenerator: for initial parameters FCN = -525829
MnSeedGenerator: Initial state:   - FCN =  -525828.7376072 Edm =      147.178 NCalls =    759
MnSeedGenerator: Negative G2 found - new state:   - FCN =  -525833.3549924 Edm =      147.956 NCalls =   1957
VariableMetric: start iterating until Edm is < 0.1
VariableMetric: Initial state   - FCN =  -525833.3549924 Edm =      147.956 NCalls =   1957
VariableMetric: Iteration #   1 - FCN =    -525913.51512 Edm =      8.01561 NCalls =   2401
VariableMetric: Iteration #   2 - FCN =  -525923.2156838 Edm =      5.74483 NCalls =   2814
VariableMetric: Iteration #   3 - FCN =  -525925.4113957 Edm =      6.31093 NCalls =   3216
VariableMetric: Iteration #   4 - FCN =  -525927.9933848 Edm =      5.58877 NCalls =   3626
VariableMetric: Iteration #   5 - FCN =  -525929.6711013 Edm =      10.5505 NCalls =   4038
VariableMetric: Iteration #   6 - FCN =  -525932.4524567 Edm =       6.0339 NCalls =   4443
VariableMetric: Iteration #   7 - FCN =  -525936.0139025 Edm =      3.04614 NCalls =   4850
VariableMetric: Iteration #   8 - FCN =  -525937.4985943 Edm =      2.56445 NCalls =   5259
VariableMetric: Iteration #   9 - FCN =  -525938.0698921 Edm =     0.616322 NCalls =   5675
VariableMetric: Iteration #  10 - FCN =  -525938.6736287 Edm =     0.856418 NCalls =   6082
VariableMetric: Iteration #  11 - FCN =   -525939.492578 Edm =     0.813897 NCalls =   6509
VariableMetric: Iteration #  12 - FCN =  -525940.5082241 Edm =     0.999342 NCalls =   6930
VariableMetric: Iteration #  13 - FCN =  -525940.8867204 Edm =     0.623391 NCalls =   7336
VariableMetric: Iteration #  14 - FCN =  -525941.0925213 Edm =      84.1628 NCalls =   7743
VariableMetric: Iteration #  15 - FCN =  -525941.0940853 Edm =     0.405619 NCalls =   8154
VariableMetric: Iteration #  16 - FCN =  -525941.2868183 Edm =     0.244769 NCalls =   8565
VariableMetric: Iteration #  17 - FCN =  -525941.4222938 Edm =     0.241935 NCalls =   8966
VariableMetric: Iteration #  18 - FCN =  -525941.4494202 Edm =    0.0787406 NCalls =   9369
MnUserParameterState 

# of function calls: 9369
function Value: -525941.4494202
expected distance to the Minimum (edm): 0.07874063514431

**Server (very long, error bars too large**
Global fit minimizer: Minuit2
MnSeedGenerator: for initial parameters FCN = -525829
MnSeedGenerator: Initial state:   - FCN =  -525828.7318209 Edm =       147.25 NCalls =    759
MnSeedGenerator: Negative G2 found - new state:   - FCN =  -525833.3497517 Edm =      148.043 NCalls =   1957
VariableMetric: start iterating until Edm is < 0.1
VariableMetric: Initial state   - FCN =  -525833.3497517 Edm =      148.043 NCalls =   1957
VariableMetric: Iteration #   1 - FCN =  -525913.6006037 Edm =      8.02604 NCalls =   2405
VariableMetric: Iteration #   2 - FCN =  -525923.1716172 Edm =      4.60011 NCalls =   2826
VariableMetric: Iteration #   3 - FCN =  -525925.7234726 Edm =      5.15759 NCalls =   3243
VariableMetric: Iteration #   4 - FCN =  -525927.7400365 Edm =      10.0189 NCalls =   3655
VariableMetric: Iteration #   5 - FCN =  -525927.9869671 Edm =      288.334 NCalls =   4055
VariableMetric: Iteration #   6 - FCN =  -525929.4381921 Edm =      567.195 NCalls =   4466
VariableMetric: Iteration #   7 - FCN =  -525931.1863982 Edm =       314.68 NCalls =   4865
VariableMetric: Iteration #   8 - FCN =  -525932.4655994 Edm =      148.986 NCalls =   5270
VariableMetric: Iteration #   9 - FCN =  -525933.1154525 Edm =      204.331 NCalls =   5673
VariableMetric: Iteration #  10 - FCN =  -525933.8473938 Edm =      83.6314 NCalls =   6088
VariableMetric: Iteration #  11 - FCN =  -525934.2489062 Edm =       57.046 NCalls =   6493
VariableMetric: Iteration #  12 - FCN =  -525934.5127006 Edm =      35.2139 NCalls =   6896
VariableMetric: Iteration #  13 - FCN =  -525934.8117592 Edm =      62.9293 NCalls =   7293
VariableMetric: Iteration #  14 - FCN =  -525934.9820228 Edm =      139.363 NCalls =   7693
VariableMetric: Iteration #  15 - FCN =  -525935.2480437 Edm =      87.3529 NCalls =   8093
VariableMetric: Iteration #  16 - FCN =  -525935.3707353 Edm =       90.324 NCalls =   8491
VariableMetric: Iteration #  17 - FCN =   -525935.734393 Edm =      39.8438 NCalls =   8889
VariableMetric: Iteration #  18 - FCN =  -525936.1465651 Edm =      55.0466 NCalls =   9285
.......
VariableMetric: Iteration #  88 - FCN =  -525943.7149127 Edm =      1.11081 NCalls =  37638
VariableMetric: Iteration #  89 - FCN =  -525943.7151638 Edm =    0.0922521 NCalls =  38043
MnUserParameterState 

# of function calls: 38043
function Value: -525943.7151638
expected distance to the Minimum (edm): 0.09225207027956

moneta · February 10, 2022, 8:19am

Hi,
Looking at the log I see a small difference in the computed function value FCN, already at the beginning, you have:

Initial state - FCN = -525833.3549924 on the portable
Initial state: - FCN = -525828.7318209 on the server

A small difference can cause Minuit taking a different path and make the fit converge in one case and in another not, especially if the fit is not very stable, or taking much longer to converge.
The difference is caused by your non-portable implementation of the FCN between the local and the server machine.
I would investigate this and possibly try to improve the numerical accuracy as much as you can of your FCN. For example in RooFit we use the compensated Kahan summation for computing the likelihood function.

Cheers

Lorenzo

tappourc · February 10, 2022, 10:10am

OK thanks, we will have a look at that.

This is what we suspected but cannot understand how to overcome it.

Cheers,

ThierryA

moneta · February 10, 2022, 2:17pm

It is possible that you are using some non portable mathematical functions implementations for log, exp, sin…

Cheers

Lorenzo

tappourc · February 10, 2022, 2:21pm

Well I don’t know what you are talking about. Non-portable functions? You mean in the encoded language code of the machines? Different implementations depending on the processors?

Thanks because I am lost…!

ThierryA

tappourc · February 10, 2022, 2:22pm

Well I don’t know what you are talking about. Non-portable functions? You mean in the encoded language code of the machines? Different implementations depending on the processors?

Thanks because I am lost…!

ThierryA

moneta · February 10, 2022, 4:36pm

I am meaning that functions like exp or log could return a different result, especially if you are maybe using instead of libm a different Mathematical library, e.g. the VC library.
Another possibility is that one of your machine is a 32 bits and the other is a 64 bits. This could also give different numerical results. (see for example c++ - 64 bit floating point porting issues - Stack Overflow)

Lorenzo

tappourc · February 10, 2022, 7:17pm

Thanks Lorenzo, we are on the track!

ThierryA

tappourc · February 10, 2022, 7:20pm

My colleague told me have libm on both machine, both 64 bits

ThierryA

Wile_E_Coyote · February 10, 2022, 7:39pm

I guess these machines use different compilers: gcc --version
Then the default optimizations will be different (and some internal functions, too).
You could try to compile your code using “-O0”.
However, I also think you should make your procedures more “robust” (e.g., if you calculate your own chi^2 and there is a “sum”, try to use the Kahan–Babushka-Neumaier summation algorithm).

tappourc · February 10, 2022, 9:20pm

Ok thanks. Yes the compilers are indeed different. We will try the -O0 option then the Kahan as we compute a log likelihood.

Cheers,

ThierryA

tappourc · February 11, 2022, 10:04am

Hi Wild_E_Coyote,

We tried “-O0”. No change, we still have different convergences. We will implement Kahan.

Cheers,

ThierryA

Wile_E_Coyote · February 11, 2022, 2:53pm

You could try to compile it using “-O2 -Wall -Wextra” and closely inspect reported problems (note: usually newer compiler will spot more, and “-O2” or “-O3” is needed here).

BTW. Another thing to check … can it be that you somewhere use “float” instead of “double”? You could also attach your source code for “inspection”.

cmercier · February 11, 2022, 5:46pm

Hello Wile_E_Coyote,

(working with ThierryA)
Thanks, -Wextra allowed me to remove unused stuff, I already had -Wall
-O0, -g, -O2, -O3 gives the exact same result, if on the same computer.
No I did not use float instead of double.

I will try remove everything and clone our git again on both, it is getting crazy.
I need to make some cleaning before attaching the source.
Minuit is taking parameters from another peace of code, which is not stable either.

Thanks

Wile_E_Coyote · February 11, 2022, 5:51pm

BTW. Make sure you are using the “latest” ROOT on both machines (i.e., with the “latest” Minuit2, as I remember, there were bug fixes applied).

cmercier · February 11, 2022, 6:36pm

Hello,
Arg, this is not the case, I downloaded isolated minuit2, Minuit2-5.34.14.tar.gz, and compiled it locally on both computers, could not find standalone minuit 2 more recent, for both debian and fedora.
Any advice ?

Wile_E_Coyote · February 11, 2022, 6:39pm

That’s a question to @Axel and @moneta … I do not know if any fixes are applied to the standalone Minuit2 (but I guess you should be able to get the latest ROOT on both systems).

moneta · February 11, 2022, 7:00pm

Hi,
That tar file is a quite old version. You can take a new version of Minuit2 directly from the ROOT github repository and build it standalone with cmake. See

github.com

root-project/root/blob/master/math/minuit2/README.md

This is the Minuit2 fitter standalone edition, from the [ROOT] toolkit. It uses [CMake] 3.1+ to build.
For information about the Minuit2 fitter, please see the [documentation in ROOT][minuitdoc].

## Source

There are two ways to get Minuit2; you can checkout the [ROOT] source, then just build or use `add_subdirectory` with `<ROOT_SOURCE>/math/minuit2`, or you can get a Minuit2 source distribution which contains all the needed files to build with [CMake]. See [DEVELOP.md] for more information about extracting the source files from [ROOT].


## Building

To build, use the standard [CMake] procedure; on most systems, this looks like:

```bash
mkdir PATH_TO_MINIUT2_BUILD
cd PATH_TO_MINUIT2_BUILD
cmake PATH_TO_MINUIT2_SOURCE
cmake --build .
```

Of course, GUIs, IDEs, etc. that work with [CMake] will work with this package. The standard method of CMake building, with a build directory inside the Minuit2 source directory and using the makefile generator, would look like:

This file has been truncated. show original

If you have any issue building it please let me know

Lorenzo

tappourc · February 13, 2022, 7:23am

Thans Lorenzo and Wild_E_Coyote. We are still investigating.
We also encountered a strange behaviour with edm.
We had the convergence stopping while the criterion was not met, how come? (See below)

Cheers,

ThierryA

Global fit minimizer: Minuit2
MnSeedGenerator: for initial parameters FCN = -524556
MnSeedGenerator: Initial state: - FCN = -524556.4700478 Edm = 1744.1 NCalls = 799
MnSeedGenerator: Negative G2 found - new state: - FCN = -524640.6092162 Edm = 1703.33 NCalls = 4153
VariableMetric: start iterating until Edm is < 0.1
VariableMetric: Initial state - FCN = -524640.6092162 Edm = 1703.33 NCalls = 4153
VariableMetric: Iteration # 1 - FCN = -525582.0514543 Edm = 561.556 NCalls = 4644
VariableMetric: Iteration # 2 - FCN = -525784.2195521 Edm = 54.3211 NCalls = 5092
…
VariableMetric: Iteration # 136 - FCN = -525963.8532168 Edm = 1.2574 NCalls = 62028
VariableMetric: Iteration # 137 - FCN = -525963.8741607 Edm = 0.163729 NCalls = 62445
MnUserParameterState .CovarianceStatus()
3
Fonction minimum

Minuit did successfully converge.

of function calls: 62456

minimum function Value: -525963.8741607
minimum edm: 0.1637290464275

moneta · February 14, 2022, 9:03am

Hi,

Are you still using the old version 5.34 or the new one ?
This is strange, but the logic for the end is rather complex, it is also corrected using some information computed from the covariance matrix. I would need the full print-out, possibly with a more verbose option, to understand exactly what is happening.

Cheers
Lorenzo