RooFit: different outputs from Mac OS X and lxplus

Jaro · May 2, 2015, 5:09pm

Dear support,

I am facing a problem which I believe is responsible for few convergence problems in running MC Toys with (~40 parametric) unbinned simultaneous M.L. fit on lxplus which do not appear on my machine. I identified that the symptom actually appears already at the level of running simple unbinned M.L. fit to one sample which I am attaching here. In this example both fits converge fine but the fit result is not exactly identical as I would expect from running the same code with the same ROOT(v5.34/13) and RooFit(v3.59) versions on the two machines (Mac OS X Yosemite, lxplus SLC6). Since RooFit uses TMinuit, I have made an attempt to print out all initial values of public data members at the very beginning of fit.C. They differ on the two machines and I did not find a handle how to re-set them via RooFit interfaces RooMinuit/RooMinimizer.

——————————————
In the attachment:

fit.C - main fit macro
RooJohnsonSU.cxx - fit component function evaluation (JohnsonSU)
RooJohnsonSU.h - fit component function header
MinuitParams.C - attempt to print out all public data members of TMinuit called by fit.C at the beginning

Log files to be downloaded:

MyMachineFitRUN.log - log from run on my machine
cernbox.cern.ch/public.php?serv … 509b63fddb

LXplusFitRUN.log - log from run on lxplus
cernbox.cern.ch/public.php?serv … ab43facb44

MinuitLog_MyMachine.log - MINUIT log file dump from my machine
cernbox.cern.ch/public.php?serv … 2f30afb72a

MinuitLog_LXplus.log - MINUIT log file dump from lxplus
cernbox.cern.ch/public.php?serv … 4762aa59e4

Download data ntuple and log files from:
cernbox.cern.ch/public.php?serv … a3e8372f1f
MassMCsignal.root (22MB) - ntuple with the data points

——————————————
How to run:

root

.L RooJohnsonSU.cxx+
.L fit.C+
fit(); > file.log

Would someone have a hint why this is happening or suggestions what further steps could I take ?

Thanks !
Cheers,
Jaroslav

PS: My machine means:
Mac OS X Yosemite 10.10.3 (X86-64)
ROOT 5.34/13
Apple LLVM version 6.1.0 (clang-602.0.49) (based on LLVM 3.6.0svn)
Target: x86_64-apple-darwin14.3.0
Thread model: posix
RooJohnsonSU.h (1.24 KB)
RooJohnsonSU.cxx (1.96 KB)
MinuitParams.C (25.8 KB)
fit.C (4.88 KB)

Danilo · May 2, 2015, 7:52pm

Hi,

thanks for the amount of details you attached to the post.
On the other hand, your ansatz is not entirely correct:

The results are not expected to be bit by bit identical. The code, as you correctly point out, is the same but the system libraries it is linked against are not. The most striking difference between osx and slc6 in this case is probably the implementation of the mathematical functions your fit involves. Apple ships a libm which provides implementations of math functions (e.g. log) which are faster than the ones of glibc libm but which yield slightly different results.

Cheers,
Danilo

Jaro · May 4, 2015, 1:07pm

Hello Danilo,

Thank a lot for this explanation ! I completely missed the system library dependency. Unless I want to pursue this problem all the way to the function implementation differences on the two systems, would you know if some comparison of performance of these implementations was made in the past ? In highly parametric unbinned simultaneous M.L. fit I observe when running 1000 Toys 4% not converging, when I rerun these 4% on my laptop they converge without a problem. When comparing the log files, they are exactly the same and start to deviate only at the point when the fitting procedure starts after few calls to the FCN the values start to differ at the last digit and when the status of parameters is reported then the differences are only at the derivatives later piling up finish in failure of the fitter. I admit the fitter numerical stability is not very good. I have to offset the likelihood at the initialisation in order to get rid of the initial percentage of 20% non-converging MC Toys. So my question is, is there a reliable way to say that the function implementations on lxplus may perferm in such numerically unstable conditions worse than the implementations from apple ?

Thanks a lot for your time and advices, they may really help me to understand whether my results are still OK.

Cheers,
Jaroslav

Danilo · May 4, 2015, 1:27pm

Hi Jaroslav,

I think you perfectly got the point.
I think that in this particular case we are dealing with a procedure which is not overly stable numerically. The fact that the fits diverge changing slightly the result of the math functions is a clear sign of that.
Now, about the correctness of the results I think there are two arguments:

In general, if the rest of the software stack you are using is validated with slc6, the results of mac are not validated and therefore cannot be trusted. This may change according to the policies of the experiment you are working for, for example.
There are correctly rounded implementation of math functions around, for example lipforge.ens-lyon.fr/www/crlibm/ . I think that this particular one might be an authoritative answer to your question.

Cheers,
Danilo

moneta · May 4, 2015, 2:06pm

Hi,

You might want to try to use the option RooFit::Offset(true) in RooAbsPdf::fitTo or RooABsPdf::createNLL.
This removes the NLL offset and makes the fit much more numerically stable.
You might eventually try to use Minuit2, where sometimes works better

Best Regards

Lorenzo

Jaro · May 4, 2015, 3:14pm

Hi Lorenzo and Danilo,

Thank you very much for a clear answer and advices ! I have only one additional question in point 2) below.

Thank you for the pointer to crlibm library I did not know it exists. I will check whether the package is installed by default on LSF batch (or GRID) where my Toys are running.

RooFit::Offset(true) in RooAbsPdf::fitTo makes from the original 20% of non-converging fits on lxbatch the 4% I was talking abou. I worry that this is still not good enough to achieve my wishful target (0% or at least below 1%).
wrt. RooAbsPdf::createNLL, I am not sure I understood this point. Do you mean using RooAbsPdf::createNLL and then when calling migrad(), it is not necessary to use the offset anymore and one can expect to get a better performance ? Is that correct ?

Thank you one more time !

Best,
Jaroslav

moneta · May 4, 2015, 3:29pm

Hi,

for the remaining fit not converging you might try to start them from a different point. You might try to run Scan first and then run Migrad from the Scan result.
Another thing to improve the fit is to scale all parameters to be around 1. I see some parameters have fit values around 10^3, others around 10^-1. This increases the numerical error.
But what is the error you get for the non-converging fits ?

Best Regards

Lorenzo

P.S. Concerning the differences from the logs, I see they are really minimal at the level of 10^-6 in the NLL.
I think one should expect accuracy at the level of 10^-4 in the NLL, which is within the default tolerance used by RooMinuit/RooMinimizer

Jaro · May 4, 2015, 4:37pm

Hello Lorenzo,

the two fit logs attached are both converging fine and they are not the MC Toys I am running. I quickly wrote these just to show the symptom that causes the failure in much more elaborate fit construction, but again only in 4% of the cases.

The MC Toys of the simultaneous fit (discussed later) are refitting the template MC samples first separately one by one, extracting the fit shape parameters feeding them to the combined simultaneous fit (to the data and MC). One of these “MC template fits” was used as an example when creating this post. That is where I spotted the differences (Apple/lxplus) in the log files at first and I wanted to understand it. This was clarified as system library dependency issue.

Another topic I raised up, unfortunately in parallel (sorry for that), was, if this could cause failures observed on one machine and a success on another. I observe failures on lxplus which are just around 4% of all the MC Toys (more complicated code than the one attached in the original post; 40-50 parameters, RooSimultaneous).

I attach below only extract from 2 larger log files from one such failing case of the MC Toys where you can see :

the printouts of the simultaneous pdf (FCN) values in the two runs; they start to differ from the 6th evaluation
the first status from HESSE where the errors on lxplus in several cases are extremely tiny (e.g. FracBkgPRD_1 10^-8) , while on mymachine they appear as I would expect them to be for that particular parameters (10^-3).

Thanks a lot for your help !
Cheers,
Jaroslav

Apple MAC OS X :
first calling HESSE:
[…]

** 10 **SET STR 2

NOW USING STRATEGY 2: MAKE SURE MINIMUM TRUE, ERRORS CORRECT

** 15 **HESSE 2.65e+04

FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
[#1] INFO:Minization – RooNLLVar::evaluatePartition(physics) first = 0 last = 104155 Likelihood offset now set to -255912
[#1] INFO:Minization – RooNLLVar::evaluatePartition(controlbckgPRDStep) first = 0 last = 9650 Likelihood offset now set to -4175.52
[#1] INFO:Minization – RooNLLVar::evaluatePartition(controlbckgPRDBump) first = 0 last = 1013 Likelihood offset now set to 992.646
[#1] INFO:Minization – RooNLLVar::evaluatePartition(controlbckgPRDEE) first = 0 last = 2062 Likelihood offset now set to 3659.56
[#1] INFO:Minization – RooNLLVar::evaluatePartition(controlsignalNONRAD) first = 0 last = 69087 Likelihood offset now set to -208741
[#1] INFO:Minization – RooNLLVar::evaluatePartition(controljpsipi) first = 0 last = 12009 Likelihood offset now set to -6991.03
[#1] INFO:Minization – RooNLLVar::evaluatePartition(controlsignalRAD) first = 0 last = 3146 Likelihood offset now set to 1644.82
[#1] INFO:Minization – RooNLLVar::evaluatePartition(controljpsipiRAD) first = 0 last = 2289 Likelihood offset now set to 3293.04

prevFCN = -19.74905868 START COVARIANCE MATRIX CALCULATION.
ShiftMeanData=-0.2559, fJPSIPIRAD=0.07618, fsigRAD=0.03537, nJpsipi=1831, nJpsipiRAD_ctl=2289, nPRD=2.098e+04, nPRD_ctl=1.272e+04, nexpo=1.959e+04, nsignalNONrad_ctl=6.909e+04, nsignalRAD_ctl=3146, nsignalTOT=6.175e+04, sigmaJohnJpsipi=80.81, sigmaJohnRADJpsipi=100.4, sigmaResolution=10.55, slopecombi=-0.001982, [#1] INFO:NumericIntegration – RooRealIntegral::init(JohnsonSU_tmp_Int[B_VTX_mass]) using numeric integrator RooIntegrator1D to calculate Int(B_VTX_mass)
[#1] INFO:NumericIntegration – RooRealIntegral::init(JohnsonSURAD_tmp_Int[B_VTX_mass]) using numeric integrator RooIntegrator1D to calculate Int(B_VTX_mass)
[#1] INFO:NumericIntegration – RooRealIntegral::init(JohnsonSUJpsipi_tmp_Int[B_VTX_mass]) using numeric integrator RooIntegrator1D to calculate Int(B_VTX_mass)
[#1] INFO:NumericIntegration – RooRealIntegral::init(JohnsonSURADJpsipi_tmp_Int[B_VTX_mass]) using numeric integrator RooIntegrator1D to calculate Int(B_VTX_mass)
[#1] INFO:NumericIntegration – RooRealIntegral::init(SigmoidBump_tmp_Int[B_VTX_mass]) using numeric integrator RooIntegrator1D to calculate Int(B_VTX_mass)
[#1] INFO:NumericIntegration – RooRealIntegral::init(SigmoidStep_tmp_Int[B_VTX_mass]) using numeric integrator RooIntegrator1D to calculate Int(B_VTX_mass)

prevFCN = -19.74905868 FracBkgPRD_1=0.9053,
prevFCN = -19.73618614 FracBkgPRD_1=0.9047,
prevFCN = -19.75201753 FracBkgPRD_1=0.905, FracBkgPRD_2=0.1624,
prevFCN = -19.71759488 FracBkgPRD_2=0.1617,
prevFCN = -19.77014962 FracBkgPRD_2=0.162, ShiftMeanData=-0.233,
prevFCN = -19.67680432 ShiftMeanData=-0.2787,
prevFCN = -19.80145811 ShiftMeanData=-0.2397,
prevFCN = -19.70013269 ShiftMeanData=-0.272,
[…]
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=-19.7491 FROM HESSE STATUS=OK 1699 CALLS 1700 TOTAL
EDM=5.2462 STRATEGY= 2 ERROR MATRIX ACCURATE
EXT PARAMETER INTERNAL INTERNAL
NO. NAME VALUE ERROR STEP SIZE VALUE
1 FracBkgPRD_1 9.04999e-01 2.75834e-03 9.26550e-04 9.44147e-01
2 FracBkgPRD_2 1.62043e-01 3.25610e-03 8.83844e-04 -7.42204e-01
3 ShiftMeanData -2.55859e-01 2.32900e-01 1.61313e-04 -2.55859e-03
4 deltaJ 1.62137e+00 7.27117e-02 7.05240e-05 -1.31544e+00
5 deltaJRAD 1.17391e+00 1.85704e-01 3.26450e-04 -1.35368e+00
6 deltaJohnJpsipi 2.00690e+00 1.60416e-01 1.35830e-04 -1.28651e+00
7 deltaJohnRADJpsipi 2.33692e+00 9.34816e-01 5.90480e-04 -1.26385e+00
8 exposlopeEE -1.81055e-03 4.67008e-04 1.26935e-05 3.37278e-01
9 exposlopeEE2SF 6.82759e+00 8.93238e-01 2.10263e-03 -1.04207e+00
10 fJPSIPIRAD 7.61787e-02 3.60360e-05 1.35839e-05 -1.01153e+00
11 fgaussNONRAD 1.74397e-01 2.88698e-02 1.97355e-03 -7.09172e-01
12 fgaussRAD 9.04529e-02 3.02806e-02 2.63315e-03 -9.59830e-01
13 fracExpoEE 7.16969e-01 8.65871e-02 4.06764e-03 4.48858e-01
14 fracSigmoidbump 9.69208e-01 1.07649e-02 4.27795e-03 1.21802e+00
15 fracSigmoidstep 9.34202e-01 7.46366e-03 1.40370e-03 1.05198e+00
16 frac_GaussJpsipi 4.05493e-03 1.26988e-03 1.83546e-03 -1.44335e+00
17 frac_GaussRADJpsipi 2.46257e-01 2.04791e-01 3.77445e-03 -5.32264e-01
18 fsigRAD 3.53692e-02 1.16997e-05 6.33405e-06 -1.19241e+00
19 gammaJ -1.63209e-01 2.12517e-02 1.19848e-04 1.48998e+00
20 gammaJRAD 4.35499e-01 6.57555e-02 3.32837e-04 -1.43872e+00
21 gammaJohnJpsipi -1.44151e+00 1.16517e-01 1.16463e-04 1.33009e+00
22 gammaJohnRADJpsipi -1.47339e+00 5.42106e-01 7.88616e-04 1.32743e+00
23 mean 5.27631e+03 1.66554e+00 1.78512e-04 -1.05414e-02
24 meanGaussRADJpsipi 5.26312e+03 1.57314e+02 3.05753e-03 -4.82582e-02
25 meanJ 5.27662e+03 3.70918e-01 3.33104e-05 -9.66658e-03
26 meanJRAD 5.27946e+03 2.11029e+00 2.13841e-04 -1.55471e-03
27 meanJohnJpsipi 5.31966e+03 3.01636e+00 1.83244e-04 1.13556e-01
28 meanJohnRADJpsipi 5.30145e+03 2.19356e+01 7.24272e-04 6.13249e-02
29 meanRAD 5.07679e+03 3.60647e+01 4.97599e-03 -6.19474e-01
30 meanSigmoidBump 5.00644e+03 3.30578e+00 1.44046e-03 -6.27126e-01
31 meanSigmoidStep 5.14190e+03 1.04646e+00 4.77140e-04 1.45946e-01
32 nJpsipi 1.83147e+03 2.45424e+02 3.03375e-04 -1.48518e+00
33 nJpsipiRAD_ctl 2.28900e+03 4.78434e+01 3.18949e-04 -1.26704e+00
34 nJpsipi_ctl 1.20090e+04 1.09585e+02 3.36669e-04 -8.63036e-01
35 nPRD 2.09848e+04 1.41464e+03 1.30536e-04 -1.28005e+00
36 nPRD_ctl 1.27250e+04 1.12805e+02 1.00445e-04 -1.34470e+00
37 nexpo 1.95937e+04 1.80509e+03 1.68908e-04 -1.28992e+00
38 nsignalNONrad_ctl 6.90870e+04 2.62844e+02 1.52253e-04 -8.09079e-01
39 nsignalRAD_ctl 3.14600e+03 5.60892e+01 9.99071e-05 -1.45856e+00
40 nsignalTOT 6.17453e+04 3.96484e+02 1.09188e-04 -1.06856e+00
41 sigma 4.24477e+01 1.43162e+00 2.56056e-04 -1.15577e+00
42 sigmaGaussRADJpsipi 1.65490e+02 7.09597e+01 2.72676e-03 -7.32890e-01
43 sigmaJ 3.55782e+01 1.17811e+00 7.21239e-05 -1.19128e+00
44 sigmaJRAD 3.68850e+01 3.07258e+00 4.18171e-04 -1.18429e+00
45 sigmaJohnJpsipi 8.08059e+01 3.59584e+00 2.70106e-04 -9.94319e-01
46 sigmaJohnRADJpsipi 1.00370e+02 1.68393e+01 8.94804e-04 -9.26063e-01
47 sigmaRAD 1.24792e+02 2.32856e+01 4.49855e-03 -8.48691e-01
48 sigmaResolution 1.05471e+01 5.53041e-01 1.23742e-03 -9.09274e-01
49 slopeSigmoidBump 1.72445e+01 1.56758e+00 7.46597e-05 -1.54453e+00
50 slopeSigmoidStep 1.76635e+01 7.59210e-01 4.32978e-05 -1.54421e+00
51 slopebump -3.44827e-03 1.77019e-03 1.75684e-04 3.34965e-01
52 slopecombi -1.98154e-03 2.34191e-04 7.29922e-06 3.37036e-01
53 slopestep -1.21497e-03 5.21922e-04 3.54673e-05 3.38119e-01
ERR DEF= 0.5

lxplus :

[…]

** 10 **SET STR 2

NOW USING STRATEGY 2: MAKE SURE MINIMUM TRUE, ERRORS CORRECT

** 15 **HESSE 2.65e+04

FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
[#1] INFO:Minization – RooNLLVar::evaluatePartition(physics) first = 0 last = 104155 Likelihood offset now set to -255912
[#1] INFO:Minization – RooNLLVar::evaluatePartition(controlbckgPRDStep) first = 0 last = 9650 Likelihood offset now set to -4175.52
[#1] INFO:Minization – RooNLLVar::evaluatePartition(controlbckgPRDBump) first = 0 last = 1013 Likelihood offset now set to 992.646
[#1] INFO:Minization – RooNLLVar::evaluatePartition(controlbckgPRDEE) first = 0 last = 2062 Likelihood offset now set to 3659.56
[#1] INFO:Minization – RooNLLVar::evaluatePartition(controlsignalNONRAD) first = 0 last = 69087 Likelihood offset now set to -208741
[#1] INFO:Minization – RooNLLVar::evaluatePartition(controljpsipi) first = 0 last = 12009 Likelihood offset now set to -6991.03
[#1] INFO:Minization – RooNLLVar::evaluatePartition(controlsignalRAD) first = 0 last = 3146 Likelihood offset now set to 1644.82
[#1] INFO:Minization – RooNLLVar::evaluatePartition(controljpsipiRAD) first = 0 last = 2289 Likelihood offset now set to 3293.04

prevFCN = -19.74905868 START COVARIANCE MATRIX CALCULATION.
ShiftMeanData=-0.2559, fJPSIPIRAD=0.07618, fsigRAD=0.03537, nJpsipi=1831, nJpsipiRAD_ctl=2289, nPRD=2.098e+04, nPRD_ctl=1.272e+04, nexpo=1.959e+04, nsignalNONrad_ctl=6.909e+04, nsignalRAD_ctl=3146, nsignalTOT=6.175e+04, sigmaResolution=10.55, slopecombi=-0.001982, [#1] INFO:NumericIntegration – RooRealIntegral::init(JohnsonSURADJpsipi_tmp_Int[B_VTX_mass]) using numeric integrator RooIntegrator1D to calculate Int(B_VTX_mass)
[#1] INFO:NumericIntegration – RooRealIntegral::init(SigmoidStep_tmp_Int[B_VTX_mass]) using numeric integrator RooIntegrator1D to calculate Int(B_VTX_mass)
[#1] INFO:NumericIntegration – RooRealIntegral::init(SigmoidBump_tmp_Int[B_VTX_mass]) using numeric integrator RooIntegrator1D to calculate Int(B_VTX_mass)
[#1] INFO:NumericIntegration – RooRealIntegral::init(JohnsonSU_tmp_Int[B_VTX_mass]) using numeric integrator RooIntegrator1D to calculate Int(B_VTX_mass)
[#1] INFO:NumericIntegration – RooRealIntegral::init(JohnsonSURAD_tmp_Int[B_VTX_mass]) using numeric integrator RooIntegrator1D to calculate Int(B_VTX_mass)
[#1] INFO:NumericIntegration – RooRealIntegral::init(JohnsonSUJpsipi_tmp_Int[B_VTX_mass]) using numeric integrator RooIntegrator1D to calculate Int(B_VTX_mass)

prevFCN = -19.74905868 FracBkgPRD_1=0.9053,
prevFCN = -19.73618614 FracBkgPRD_1=0.9047,
prevFCN = -19.75201753 FracBkgPRD_1=0.905, FracBkgPRD_2=0.1624,
prevFCN = -19.71759488 FracBkgPRD_2=0.1617,
prevFCN = -19.77014962 FracBkgPRD_2=0.162, ShiftMeanData=-0.233,
prevFCN = -19.69330079 ShiftMeanData=-0.2787,
prevFCN = -19.77727645 ShiftMeanData=-0.2422,
[…]
============== MATRIX FORCED POS-DEF BY ADDING 28041.274198 TO DIAGONAL.
FCN=-19.7491 FROM HESSE STATUS=NOT POSDEF 1831 CALLS 1832 TOTAL
EDM=5.77144e+06 STRATEGY= 2 ERR MATRIX NOT POS-DEF
EXT PARAMETER APPROXIMATE INTERNAL INTERNAL
NO. NAME VALUE ERROR STEP SIZE VALUE
1 FracBkgPRD_1 9.04999e-01 1.19547e-08 9.26550e-04 9.44147e-01
2 FracBkgPRD_2 1.62043e-01 1.50421e-08 8.83844e-04 -7.42204e-01
3 ShiftMeanData -2.55859e-01 4.14091e-06 6.05012e-04 -2.55859e-03
4 deltaJ 1.62137e+00 1.83190e-06 7.25107e-05 -1.31544e+00
5 deltaJRAD 1.17391e+00 4.43794e-07 3.26450e-04 -1.35368e+00
6 deltaJohnJpsipi 2.00690e+00 6.32318e-07 1.35830e-04 -1.28651e+00
7 deltaJohnRADJpsipi 2.33692e+00 6.13283e-07 5.90480e-04 -1.26385e+00
8 exposlopeEE -1.81055e-03 1.60503e-07 1.26935e-05 3.37278e-01
9 exposlopeEE2SF 6.82759e+00 1.01764e-06 2.10263e-03 -1.04207e+00
10 fJPSIPIRAD 7.61787e-02 5.47793e-08 1.35839e-05 -1.01153e+00
11 fgaussNONRAD 1.74397e-01 1.53110e-08 1.97355e-03 -7.09172e-01
12 fgaussRAD 9.04529e-02 1.15707e-08 2.63315e-03 -9.59830e-01
13 fracExpoEE 7.16969e-01 1.81686e-08 4.06764e-03 4.48858e-01
14 fracSigmoidbump 9.69208e-01 6.96743e-09 4.27795e-03 1.21802e+00
15 fracSigmoidstep 9.34202e-01 1.00093e-08 1.40370e-03 1.05198e+00
16 frac_GaussJpsipi 4.05493e-03 2.56448e-09 1.83546e-03 -1.44335e+00
17 frac_GaussRADJpsipi 2.46257e-01 1.73767e-08 3.77445e-03 -5.32264e-01
18 fsigRAD 3.53692e-02 7.98642e-08 6.33405e-06 -1.19241e+00
19 gammaJ -1.63209e-01 1.87000e-07 1.19848e-04 1.48998e+00
20 gammaJRAD 4.35499e-01 2.71064e-07 3.32837e-04 -1.43872e+00
21 gammaJohnJpsipi -1.44151e+00 5.56521e-07 1.16463e-04 1.33009e+00
22 gammaJohnRADJpsipi -1.47339e+00 4.87691e-07 7.88616e-04 1.32743e+00
23 mean 5.27631e+03 2.28808e-05 8.05384e-05 -1.05414e-02
24 meanGaussRADJpsipi 5.26312e+03 2.23846e-05 1.87602e-04 -4.82582e-02
25 meanJ 5.27662e+03 3.23407e-05 3.33104e-05 -9.66658e-03
26 meanJRAD 5.27946e+03 1.48199e-05 2.13841e-04 -1.55471e-03
27 meanJohnJpsipi 5.31966e+03 1.49773e-05 1.83244e-04 1.13556e-01
28 meanJohnRADJpsipi 5.30145e+03 1.41505e-05 7.24272e-04 6.13249e-02
29 meanRAD 5.07679e+03 1.14930e-05 6.47539e-03 -6.19474e-01
30 meanSigmoidBump 5.00644e+03 6.07448e-06 1.58987e-03 -6.27126e-01
31 meanSigmoidStep 5.14190e+03 7.81151e-06 4.96276e-04 1.45946e-01
32 nJpsipi 1.83147e+03 1.74796e-03 1.13074e-03 -1.48518e+00
33 nJpsipiRAD_ctl 2.28900e+03 6.81486e-04 3.18949e-04 -1.26704e+00
34 nJpsipi_ctl 1.20090e+04 1.46472e-03 3.36669e-04 -8.63036e-01
35 nPRD 2.09848e+04 8.94658e-03 1.35352e-04 -1.28005e+00
36 nPRD_ctl 1.27250e+04 8.80162e-03 1.00445e-04 -1.34470e+00
37 nexpo 1.95937e+04 7.76268e-03 1.74644e-04 -1.28992e+00
38 nsignalNONrad_ctl 6.90870e+04 1.03556e-02 1.52253e-04 -8.09079e-01
39 nsignalRAD_ctl 3.14600e+03 4.41499e-03 9.99071e-05 -1.45856e+00
40 nsignalTOT 6.17453e+04 1.14815e-02 1.09188e-04 -1.06856e+00
41 sigma 4.24477e+01 2.41040e-05 5.00754e-05 -1.15577e+00
42 sigmaGaussRADJpsipi 1.65490e+02 2.46498e-05 1.76246e-04 -7.32890e-01
43 sigmaJ 3.55782e+01 1.03304e-05 7.21239e-05 -1.19128e+00
44 sigmaJRAD 3.68850e+01 7.70170e-06 4.18171e-04 -1.18429e+00
45 sigmaJohnJpsipi 8.08059e+01 1.13409e-05 2.70106e-04 -9.94319e-01
46 sigmaJohnRADJpsipi 1.00370e+02 1.21532e-05 8.94804e-04 -9.26063e-01
47 sigmaRAD 1.24792e+02 1.33298e-05 4.35817e-03 -8.48691e-01
48 sigmaResolution 1.05471e+01 1.24887e-06 1.12279e-03 -9.09274e-01
49 slopeSigmoidBump 1.72445e+01 8.40091e-05 1.67272e-04 -1.54453e+00
50 slopeSigmoidStep 1.76635e+01 8.63325e-04 1.29273e-05 -1.54421e+00
51 slopebump -3.44827e-03 1.31159e-07 4.59668e-05 3.34965e-01
52 slopecombi -1.98154e-03 9.43009e-02 1.14692e-06 3.37036e-01
53 slopestep -1.21497e-03 8.96825e-08 7.05199e-05 3.38119e-01
ERR DEF= 0.5
EXTERNAL ERROR MATRIX. NDIM= 110 NPAR= 53 ERR DEF=0.5

moneta · May 5, 2015, 7:59am

Hi ,

What you observe is quite expected when you have 50 parameters scattered across few order of magnitude.
The problem you have is in the inversion of the calculation and then inversion of the Hessian matrix. I suspect the fit probably converges to the right value.
I guess you have also some parameters quite correlated, and this also makes the case more problematic.
If you can reduce the correlation and brings all your parameters close to one by re-defining them, it would make probably the problem easier to converge.

I would not use also strategy 2. It is not worth, since requires the calculation of the second derivatives at every step. Maybe better using strategy 0 and then run Minos to get the right uncertainty on the parameter you are interested.
And as I said before, I would also try to use Minuit2. Often it works better.

Best Regards

Lorenzo

Jaro · May 5, 2015, 9:32am

Hello Lorenzo,

Thank you ! It means that still the difference between apple and lxplus in the output comes indeed from the numerical stability of the constructed fit convoluted with the different performance of the system function libraries. Is that correct ?

Cheers,
Jaroslav

moneta · May 5, 2015, 10:20am

Yes this is correct. The fact that the mathematical functions are not implemented exactly in the same way on all architectures provides a difference in the result. However it is not guaranteed that by using more correct mathematical functions you will get better stability in the fit, because the numerical errors produced by other operations (e.g. the summation in the likelihood) is in general larger than the error resulting from the mathematical function evaluation.

Lorenzo

Jaro · May 5, 2015, 4:38pm

Hello Lorenzo and Danilo,

Thank you both very much for your fast responses and clear answers ! So far changing only the fit method did not help. Minuit2 Scan/Migrad&Strategy(0) both converge on lxplus to the same point as Apple does, but whenever any kind of covariance matrix calculation is running (hesse, minos) , on lxplus I always finish with the same problem. I will do my best to make the fitter less numerically unstable by the methods suggested by you two.

Thank you very much !

Best,
Jaroslav

puma · May 6, 2015, 9:42am

Hi Jaroslav,

This may not help you to understand differences between MacOSX and lxplus, but in my public (/afs/cern.ch/user/m/mamartin/public/forJaroslav) you can find a version of the JohnsonSU function with analytical integrals that may solve your problem with a slightly different parametrisation of your pdf.

Cheers,

Maurizio