Fit of TGraphErrors fails, Fit of TGraph without Errors works

Piet · July 26, 2017, 4:10pm

Dear Root Experts

When adding (small) errors to a simple TGraph that I am fitting with an exponental function, the fit fails. I do not have a clue why. I am excluding the first two datapoints of the graph, since they do not yet exhibit the exponential dependence I am expecting and moreover the errors of the first two datapoints are rather large, so I preferred to exclude them.

I was able to isolate my problem in a small macro that I have attached. Does anyone have an idea what is wrong or how I can tackle the problem?

The printout of the fits (first the fit on the TGraph without errors, then the fit of the TGraph with errors) is given below:

Minimizer is Minuit / MigradImproved
Chi2 = 206816
NDf = 8
Edm = 1.95775e-07
NCalls = 49
a = 2.3695 +/- 0.000592935
Warning in : Abnormal termination of minimization.
FCN=2385.16 FROM MIGRAD STATUS=FAILED 44 CALLS 45 TOTAL
EDM=1.51503e+07 STRATEGY= 1 ERR MATRIX APPROXIMATE
EXT PARAMETER APPROXIMATE STEP FIRST
NO. NAME VALUE ERROR SIZE DERIVATIVE
1 b 2.80698e+00 1.41421e+00 -0.00000e+00 5.50461e+03

ReproduceFitProblem.C (5.5 KB)

Wile_E_Coyote · July 27, 2017, 6:47am

Two small fixes …

tf_noerr->SetParameter(0, 2.5);
// ...
tf_err->SetParameter(0, 2.5);

The real problem is that ROOT has often problems when fitting anything with “x errors” (and nobody cares, as usual).
So, avoid setting “x errors”, if you can.
If you cannot, I have found a “brutal fix” which often works, and in your case it would be …

gr_err->Fit(tf_err, "WQRN"); // "initial pre-fit"
gr_err->Fit(tf_err, "RM+"); // "final fit"

moneta · July 27, 2017, 8:57am

Hi,

The problem is that adding error in “X” might add an additional non-linearity in the chi2 function that might cause in some situation the fit failing.
In addition the error should really reflects a normal fluctuations around the x values which is independent of the one on the y values. If the error represent something else this should be correctly described with a 2d likelihood fit.

The problem here is that the problem is ill posed from the beginning. Some errors are incredibly small, this explains the huge chi2 obtained from the fit.

Lorenzo

Piet · July 27, 2017, 10:34am

Hi Wile and Lorenzo

Thanks for your suggestions and comments. The brutal fix suggested by Wile works for me:

However it is interesting also going more in depth (since I am quite ignorant in this matter), since I might learn from this experience. To let you understand what exactly I am doing, I wrote a bit more information here below:

I am performing simulations of electron avalanches in gaseous detectors, with Garfield, which exhibit quite large fluctuations. I (and also other people [1]) believe that the fluctuations in the final avalanche are determined by the fluctuations in the first part of the avalanche [2]. We believe that N_{full} = N_{half}^a, with a \approx 2. Therefore I am making a plot here of the full avalanche gain on the y-axis vs the simulation of a only half of the avalanche on the x-axis. The gain (or average avalanche statistics) is obtained through a Polya fit [3]. If I take a data point with the smallest uncertainty: the second before last point:

x (simulation of 1/2 avalanche): Gain = 33.6369 +/- 0.378424 Rel Err = 1.12503 % (First plot below)
y (simulation of full avalanche): Gain = 4463.23 +/- 74.5165 Rel Err = 1.66956 % (Second plot below)

I would believe the fits are of good quality (What exactly is this “Probability” that gets plotted in the Statbox?) so I would assume that the uncertainty I assigned to the values is a reasonable uncertainty. On the other hand I do not expect that my model N_{full} = N_{half}^a, with a \approx 2 describes this data with incredible precision, since it is just a rough model…

Although the simulations are done independently, I would say that the x and y values are correlated, but for the uncertainty on the x and y-values I am not able to say so. I use the same function to fit and to extract the uncertainty for the x- and y-values, so maybe the uncertainties are correlated and I should use a 2d likelihood fit as you propose? In case you have some RooFit example / tutorial in mind, I ll be happy to try it.

Thanks a lot
Kind regards
Piet

[1] T. Zerguerras et al. NIM A 772, (2015) 76. https://doi.org/10.1016/j.nima.2014.11.014
[2] For instance if due to fluctuations half way the simulation of the avalanche, you have a very small amount of electrons, then it is unlikely that this will be corrected in the second part of the avalanche, and very likely you wlll end up with a small charge, because each of the electrons created in the first part of the avalanche can be considered as starting points for another avalanche in the second half of the total avalanche. And therefore fluctuations on a large number will result in a more averaged number.
[3] P(N_{e}) = \frac{(1+\theta)^{1+\theta}}{\Gamma(1+\theta)} \left( \frac{N_{e}}{G} \right)^\theta \exp \left[ -(1+\theta)\frac{N_{e}}{G} \right], where G = Gain = average of N_e and \theta is the Polya parameter related to the relative gain variance f = \frac{1}{1+\theta}. Just checking now I see it belongs to the group of Negative Binomial Distributions. See also: http://mathworld.wolfram.com/NegativeBinomialDistribution.html

system · August 10, 2017, 10:34am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.