Integral error

christos · November 10, 2010, 12:59pm

Hi all,

I have a histogram and a fitting function that I use on a sub-range of the histogram (let’s call it sidebands; in the example that follows it is the 6-10 region). I then use the fit results to evaluate the integral of the same function in the rest of the histogram range (in this example, it is the 10-14 region) and its uncertainty.

The result that I get is that the integral in the 10-14 region has a much larger uncertainty than the integral in the 6-10 region. Is this because the fitting function is only defined between 6 and 10?

Many thanks,

–Christos

PS.

$ which root /afs/cern.ch/sw/lcg/app/releases/ROOT/5.27.06/slc4_ia32_gcc34/root//bin/root

#include <TROOT.h>
#include <TH1F.h>
#include <TF1.h>
#include <TMath.h>
#include <TStyle.h>
#include <TRandom.h>
#include <TVirtualFitter.h>

#include <iostream>
#include <fstream>
#include <vector>
#include <set>
#include <map>

float BIN_SIZE = 0;

inline Double_t my_gauss(Double_t * x, Double_t * par)
{
  // par[0]: # of events, par[1]: mean, par[2]: sigma
  Double_t arg = 0;
  if (par[2]<0) par[2]=-par[2]; // make sure sigma > 0
  if (par[2] != 0) arg = (x[0] - par[1])/par[2]; 
  return par[0]*BIN_SIZE*TMath::Exp(-0.5*arg*arg)/
    (TMath::Sqrt(2*TMath::Pi())*par[2]);
}

using namespace std;

int testIntegralError(void)
{
  gStyle->Reset();
  // mean of gaussian
  const float mean = 10;
  // upper, lower limits of distribution
  const float LOW_LIMIT = mean -4;
  const float HI_LIMIT = mean + 4;
  const float fit_limit = mean;

  // this histogram is the "data"
  TH1F * my_hist = new TH1F("my_hist", "Gauss distribution", 200, 
			    LOW_LIMIT, HI_LIMIT);
  BIN_SIZE = my_hist->GetBinWidth(0);


  // This function will be used for "sampling" values into a histogram
  TF1 * fgen = new TF1("gauss1", my_gauss, LOW_LIMIT, HI_LIMIT, 3);
  fgen->SetParameters(1, mean, 1); // # of events, mean, sigma

  const int NENTRIES = 15000;

  // this function will be used to fit the histogram
  TF1 * fhist = new TF1("gauss2", my_gauss, LOW_LIMIT, fit_limit, 3);
  fhist->SetParNames("Evt count", "Mean", "Sigma");
  // initial "guess" for the parameters
  fhist->SetParameters(NENTRIES, mean, 0.8); // # of events, mean, sigma
  fhist->SetLineColor(kRed);

  gStyle->SetOptFit(1111);
  gStyle->SetStatH(0.0);
  gStyle->SetStatW(0.0);

#if 0
  fhist->SetParLimits(1, mean, mean);
#endif

  for(int i = 0; i != NENTRIES; ++i)
    my_hist->Fill(fgen->GetRandom());
  
  my_hist->Fit(fhist, "ILR");
  
  TVirtualFitter * fitter = TVirtualFitter::Fitter(0,3);
  Double_t par[3] = {fitter->GetParameter(0), fitter->GetParameter(1), 
		     fitter->GetParameter(2)};

  cout << " Histogram integral: " << my_hist->Integral() << endl;
  cout << " Fit-function integral between " << LOW_LIMIT << " and " 
       << fit_limit << ": " << fhist->Integral(LOW_LIMIT, fit_limit)
    /BIN_SIZE
       << " +- " << fhist->IntegralError(LOW_LIMIT, fit_limit)/BIN_SIZE
       << endl;
  cout << " Fit-function integral between " << fit_limit << " and " 
       << HI_LIMIT << ": " 
       << fhist->Integral(fit_limit, HI_LIMIT)/BIN_SIZE
       << " +- " 
       << fhist->IntegralError(fit_limit, HI_LIMIT, par,
			       fitter->GetCovarianceMatrix())/BIN_SIZE 
       << endl;

  return 0;
}

christos · November 10, 2010, 1:28pm

Oh, and probably an even more important question that I forgot to ask earlier.

If I fix one parameter in the fit, e.g. switch from

#if 0
  fhist->SetParLimits(1, mean, mean);
#endif

to

#if 1
  fhist->SetParLimits(1, mean, mean);
#endif

I can no longer calculate the uncertainty for the integral in the 10-14 region. I don’t understand this part at all.

–Christos

moneta · November 10, 2010, 2:33pm

Hi Christos,

there is nothing wrong with your macro. What you are seeing is an effect of the correlation between the mean and the amplitude of the gaussian function. On the left side there is a large positive correlation and a a negative derivative of the function w.r.t to the mean and the opposite is in the right side. This makes a very large error in extrapolating the fit results from one side to the other.
By fitting the all curve this correlation becomes small. If you fix the mean or the amplitude you also solve this problem.
I don’t see any problem in the IntegralError when one of the fit parameter is fixed. Just the uncertainty in the extrapolation region is much reduced.

Best Regards

Lorenzo

christos · November 10, 2010, 3:21pm

Hi Lorenzo,

If the mean is free to float, this is what the macro gives me:

 Histogram integral: 15000
 Fit-function integral between 6 and 10: 7508.99 +- 86.5423
 Fit-function integral between 10 and 14: 7929.34 +- 710.77

If I fix the mean, this is what I am getting instead:

 Histogram integral: 15000
 Fit-function integral between 6 and 10: 7509 +- 86.6544
 Fit-function integral between 10 and 14: 7509 +- nan

Are you saying that you are not seeing the “nan” in the last line of the printout?

–Christos

moneta · November 10, 2010, 3:30pm

Hi,

if I fix the mean to 10 and leavinbg free th eother two parameters I obtain

an identical result as expected. I am using 5.27.07 but it should be the same as 5.27.06 since nothing changed in TF1::IntegralError

Lorenzo

christos · November 10, 2010, 3:34pm

Hmmm, your results make much more sense than what I am getting. This is somewhat disturbing.

Is there a way to get a hold of a 5.27.07 build at CERN? Or should I install this at my laptop, to see if it makes a difference?

–Christos

moneta · November 10, 2010, 3:41pm

You can use 5.27.06 on /afs. I have tested using that version and I obtained the same thing. No need to go to 5.27.07. Which version are you using ?

Lorenzo

christos · November 10, 2010, 3:42pm

which root /afs/cern.ch/sw/lcg/app/releases/ROOT/5.27.06/slc4_ia32_gcc34/root//bin/root

Any ideas?

moneta · November 10, 2010, 3:44pm

Please attach also your macro giving that result

Thank you
Lorenzo

christos · November 10, 2010, 3:56pm

Hi Lorenzo,

It is exactly what I copied + pasted in my first post. Attaching here as well.

–Christos
testIntegralError.C (2.61 KB)

moneta · November 10, 2010, 5:51pm

Hi Christos,

I found the problem. The TVirtualFitter returns a reduced covariance matrix (skipping the fit parameters).
I would use the TFitResult as in the attached example and it will work. Also in your case you don’t need to pass the covariance matrix to TF1::IntegralError

Lorenzo
testIntegralError.C (2.67 KB)

christos · November 10, 2010, 8:55pm

Dear Lorenzo.

Many thanks for the fix. The modified macro works as advertised.

For my own information; when you say

do you mean that the original code ran into a bug that has been fixed in 5.27.07, or is this behavior (skipping the fit parameters) built in by design?

Best,

–Christos

moneta · November 10, 2010, 9:12pm

Hi,

it is not a bug. The TVirtualFitter::GetCovarianceMatrix() returns by design a pointer to a reduced covariance matrix (without the fixed parameters) and you cannot pass it, when there are some fixed parameters, in TF1::IntegralError, which requires the full matrix. For this reason you should use the TFitResult as in attached macro and suggested in root.cern.ch/root/html/TF1.html# … egralError.
I will need to add a note in the documentation about this

Lorenzo