Improving a landau-gaussian convoluted fit

Ruina · October 11, 2019, 11:51am

Dear experts,
I am using the langaus convolution fit for my data but I fail to understand the behaviour of the fits. For some chips, it is closer to expectation (figure 1) while for some it is totally off (figure 2). Also, I noticed that the fit somehow worsens drastically when I have more data (comparing figures 1 and 2 with figure 3). I am probably not very good with the concepts of fitting, especially with those of a convoluted landau-gaussian fit and therefore, not able to figure out the reason behind this. I don’t know what I should do in order to improve the fit. Any help in this regard is highly appreciated!

The code and figures are attached.

Thank you for your time!

langaus_fitting.cxx (4.5 KB)

figures.pdf (155.0 KB)

couet · October 11, 2019, 1:22pm

I guess @moneta can help.
But, may be, make sure you post all what is needed to run your macro.
When trying to run it I get:

$ root langaus_fitting.cxx
   ------------------------------------------------------------------
  | Welcome to ROOT 6.19/01                        https://root.cern |
  | (c) 1995-2019, The ROOT Team; conception: R. Brun, F. Rademakers |
  | Built for macosx64 on Oct 10 2019, 12:59:53                      |
  | From heads/master@v6-19-01-1665-g8b9d7896dd                      |
  | Try '.help', '.demo', '.license', '.credits', '.quit'/'.q'       |
   ------------------------------------------------------------------

root [0] 
Processing langaus_fitting.cxx...
In file included from input_line_10:1:
/Users/couet/Downloads/langaus_fitting.cxx:1:10: fatal error: '../inc/myheaderfile.h' file not found
#include "../inc/myheaderfile.h"
         ^~~~~~~~~~~~~~~~~~~~~~~
/Users/couet/Downloads/langaus_fitting.cxx:58:29: error: use of undeclared identifier 'langaufit'
        TF1 *fitVAEnergy0 = langaufit(hist0,fr0,sv0,pllo0,plhi0,fp0,fpe0,&chisqr0,&ndf0);
                            ^
/Users/couet/Downloads/langaus_fitting.cxx:61:9: error: use of undeclared identifier 'langaupro'
        langaupro(fp0,Peak0,FWHM0);
        ^
/Users/couet/Downloads/langaus_fitting.cxx:76:29: error: use of undeclared identifier 'langaufit'
        TF1 *fitVAEnergy1 = langaufit(hist1,fr1,sv1,pllo1,plhi1,fp1,fpe1,&chisqr1,&ndf1); // https://root.cern.ch/r...
                            ^
/Users/couet/Downloads/langaus_fitting.cxx:79:9: error: use of undeclared identifier 'langaupro'
        langaupro(fp1,Peak1,FWHM1); // https://root.cern.ch/root/html/tutorials/fit/langaus.C.html
        ^
root [1]

Ruina · October 11, 2019, 2:01pm

I’m sorry, I updated the code.

I compile using:

g++ -std=c++11 -Wall `root-config --cflags --libs` langaus_fitting.cxx -o langaus_fitting

And execute using

./langaus_fitting <inFile>

Here, and also in the code, inFile is the rootfile containing the histograms that I want to fit and histFile is a list of histogram names that is read line-by-line to access the histograms.

FoxWise · October 11, 2019, 4:32pm

Hi!

You can try to set better starting values for a fit. Maybe it will help.
For example:

// par[2]=Total area (integral -inf to inf, normalization constant)
par[2] = histo->Integral();
// or Integral(min_x, max_x)

This will get the initial Area value to the correct one. And hopefully it will not fail.
Because what can I tell from the pictures, it definitely fails to find proper area.

cheers

Ruina · October 14, 2019, 10:32am

Thanks, Foxwise!
I made these changes to the area parameters:

I set the starting values of the area (par[2]) to the integral of the histograms (note that there are two histograms I am fitting simultaneously, hence the 0 and 1 suffixes)
I set the lower and upper limits of this parameter as -10000 and +10000, respectively

339                  // --- Hist0 --- //
340 
341                  std::cout << "hist0 " << hist0->Integral() << std::endl;
342 
343                  // Setting fit range and start values
344                  double fr0[2], sv0[4], pllo0[4], plhi0[4];
345                  fr0[0] = 0.;         fr0[1] = 200.;
346                  sv0[0] = 2.;         sv0[1] = 55.;        sv0[2] = hist0->Integral();      sv0[3] = 5.;
347                  pllo0[0] = 0.5;      pllo0[1] = 50.;      pllo0[2] = sv0[2] - 10000;       pllo0[3] = 1.;
348                  plhi0[0] = 6.;       plhi0[1] = 60.;      plhi0[2] = sv0[2] + 10000;       plhi0[3] = 5.;
349 
350                  // Return values
351                  double fp0[4], fpe0[4];
352                  double chisqr0;
353                  int ndf0;
354                  TF1 *fitVAEnergy0 = langaufit(hist0,fr0,sv0,pllo0,plhi0,fp0,fpe0,&chisqr0,&ndf0);
355 
356                  double SNRPeak0, SNRFWHM0;
357                  langaupro(fp0,SNRPeak0,SNRFWHM0);
358 
359                  // --- Hist1 --- //
360 
361                  std::cout << "hist1 " << hist1->Integral() << std::endl;
362 
363                  // Setting fit range and start values
364                  double fr1[2], sv1[4], pllo1[4], plhi1[4];
365                  fr1[0] = 0.;         fr1[1] = 200.;
366                  sv1[0] = 2.;         sv1[1] = 40.;        sv1[2] = hist1->Integral();     sv1[3] = 3.;
367                  pllo1[0] = 0.5;      pllo1[1] = 35.;      pllo1[2] = sv1[2] - 10000.;     pllo1[3] = 1.;
368                  plhi1[0] = 5.;       plhi1[1] = 45.;      plhi1[2] = sv1[2] + 10000.;     plhi1[3] = 10.;

This gives me the following output

hist0 262265
 FCN=21107.2 FROM MIGRAD    STATUS=CONVERGED     162 CALLS         163 TOTAL
                     EDM=3.33703e-06    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                   STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  LWidth       6.00000e+00   1.08011e-04   6.28795e-04** at limit **
   2  MPV          5.00000e+01   1.50066e-03   1.73820e-03** at limit **
   3  Area         2.52542e+05   5.10047e+02   1.62365e-02  -7.47143e-03
   4  GSigma       5.00000e+00   2.88992e-04   1.20606e-03** at limit **
hist1 151917
 FCN=14506.5 FROM MIGRAD    STATUS=CONVERGED     228 CALLS         229 TOTAL
                     EDM=1.30251e-06    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                   STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  LWidth       5.00000e+00   2.72400e-04   9.15161e-04** at limit **
   2  MPV          3.85333e+01   4.68621e-02   4.92188e-04   2.33034e-03
   3  Area         1.42306e+05   3.82704e+02   8.17839e-03  -1.15517e-02
   4  GSigma       8.46014e+00   4.92809e-02   7.29985e-04  -8.82016e-03
Info in <TCanvas::Print>: ps file c1.ps has been created

where the first derivative of the area parameter is not ** at limit ** anymore, as it was previously.
However, there are still two problems:

The fits look better than before but how can I improve it further? (pictures attached)
What does the message ** at limit ** exactly mean and how should I take care of them?

figure.pdf (47.9 KB)

FoxWise · October 14, 2019, 11:44am

Hi, Ruina!

** at limit ** means that your parameter reached the boundary condition and your fit cannot probe parameters further.

As I can see from your results for hist0:
LWidth = 6. at limit - because your boundaries are: [0.5, 6.]
MPV = 50 at limit - because your boundaries are: [50., 60.]
GSigma = 5 at limit - - because your boundaries are: [1., 5.]

Judging by the pictures, its LWidth of histo is larger than of the fit. Because of your boundary it cant exceed 6.

So I would propose to make boundaries larger (not infinitely larger)
And also you can try to improve starting values of another parameters
(this has most influence for langau fit from my experience)

For example for MPV of langau:

sv0[1] = histo0->GetMean();
sv1[1] = histo1->GetMean();

Which will set it to some close value.

You can try the same with LWidth as histo0->GetStdDev(); if it will help

And you can’t do something like that for GSigma. But increase the limits and let it fit its best.

P.S. Also I would fit it with some range as you did before. Starting from 20 lets say. To avoid fitting bump in the beginning, which is definitely had nothing to do with langau distribution

Hope it will help,

have a nice day,
Fox

Wile_E_Coyote · October 14, 2019, 11:45am

You set “limits” via your “pllo0”, “plhi0”, “pllo1” and “plhi1” arrays.

Ruina · October 14, 2019, 12:56pm

Hi FoxWise!
Thank you very much for your super helpful suggestions!

So I tried them out and here’s what I found:

GetMean() doesn’t work well as the mean values are much higher than the MPV which is the “most probable value” due to the long landau tail.
GetStdDev() doesn’t work at all because the sigma of the landau distribution is a scale parameter that defines the “spread” of the distribution, not its sigma, which is undefined.
Yes, I set the start of the fit range to 20. That is sensible to do.

However, now I finally managed to get all the first derivatives not ** at limit **

hist0 std dev 35.2005
hist0 mean 70.5104
hist0 integral 242445
 FCN=4681 FROM MIGRAD    STATUS=CONVERGED     196 CALLS         197 TOTAL
                     EDM=1.18729e-07    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                   STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  LWidth       8.03594e+00   3.84003e-02   2.63703e-04  -2.09781e-02
   2  MPV          4.95082e+01   3.78676e-02   6.43794e-05  -8.32714e-02
   3  Area         2.49236e+05   5.21572e+02   2.34681e-03  -2.24635e-03
   4  GSigma       6.84353e+00   7.56135e-02   2.12574e-04   3.49832e-02

except for the GSigma of the smaller histogram (hist1, in blue).

hist1 std dev 41.5347
hist1 mean 66.4289
hist1 integral 142025
 FCN=6235.78 FROM MIGRAD    STATUS=CONVERGED     257 CALLS         258 TOTAL
                     EDM=1.19169e-06    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                   STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  LWidth       8.96020e+00   3.58728e-02   4.28028e-04  -1.13823e-01
   2  MPV          3.87507e+01   4.91708e-02   6.83651e-05   7.53518e-02
   3  Area         1.43995e+05   3.98044e+02   1.54645e-03   1.24198e-02
   4  GSigma       5.00000e-01   5.71204e-02   4.17623e-03** at limit **
Info in <TCanvas::Print>: ps file c1.ps has been created

Also, the fit still does not look good enough, it should squeeze a little more (picture attached). I tried with reducing the scale parameters but that didn’t help.

figure_new.pdf (47.8 KB)

I’d really appreciate if someone can help me with the interpretation of the fit params so that I can understand which ones and how to tweak them further.

Cheers,
Ruina

FoxWise · October 14, 2019, 1:25pm

Hi,

You can increase GSigma boundaries more for hist1, so it will not be stuck at the 0.5 limit

The only way to improve it further is to tweek starting values.

it should squeeze a little more

Yes, therefore try to make LWidth starting parameter a bit smaller. And see what will happen. And maybe Area parameter a little bit more (not sure)

help me with the interpretation of the fit params
What exactly do you want to know?
Area - total area of the fit
MPV - Peak coordinate
LWidth - width
GSigma - sigma of the gaussian which smears all the points of the function

So if you see that your fit fails with width/peak/area, tune those parameters starting values and boundaries if they are on limits.

GSigma shouldn’t that much influence the fit if width/peak/area parameters are chosen correctly.
If they are mismatched, probably it will try to fix the fit with GSigma which isn’t good.
That happens btw in histo0.
Because GSigma = 6 quite large. I would expect it to be less than 1. ±.
So you can make starting values for GSigma around 0.5.
And tweek with LWidth accordingly.

So the only solution is to play around with all the numbers…

cheers,
Fox

Ruina · October 14, 2019, 2:59pm

Yes, but how?
So this …

… is great for starters!
Thanks a lot!

FoxWise · October 14, 2019, 3:34pm

Yes, but how?

If your fit wider than histo → decrease LWidth parameter starting value
If your fit more narrow than histo → increase LWidth parameter starting value

If your fit Area bigger than histo → decrease Area parameter starting value
If your fit Area smaller than histo → increase Area parameter starting value

If your fit MPV to the right from histo → decrease MPV parameter starting value
If your fit MPV to the left from histo → increase MPV parameter starting value

For how much to increase/decrease i can’t tell. You need to play around and see how these changes affect the fit and decide what is appropriate

And repeat it till you find nice starting values so it fits the best.
If after this your fit will not improve and will be stuck, well you did the best you can. And I don’t think there is more you can do. (only as try to fit langau+some other function for the background)

You are welcome

Ruina · October 14, 2019, 10:01pm

Hahaha… I understand this!
Sorry, I wasn’t clear!

What I meant was when you wrote “So the only solution is to play around with all the numbers…” I knew that that’s what one as to do… play around with numbers. Of course.

But the important question was “how” to play, which you had already answered in the previous message…

So I said that “this was great for starters” meaning that this explanation of yours was a great starting point. Guess I should have marked the previous reply as solution but I was waiting if anyone else had any other input.

Anyway, thanks a ton for all the help!

Cheers,
Ruina

system · October 28, 2019, 10:11pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.