Fit results differ when fitting same data in TH1 or TGraphEr

honk · January 28, 2014, 11:12pm

Hi,

I would like to understand why I see different fit results (both in expected value and its uncertainty) when fitting the exact same data in a TH1 or a TGraphErrors. I originally came across this when reworking a fit of a TGraph to use a custom chi2 function via a ROOT::Math::Minimizer. When using a conventional chi2 function that code was able to reproduce the fits of TH1, but not TGraphErrors. In this example

#include <cmath>
#include <iostream>
#include <TF1.h>
#include <TGraphErrors.h>
#include <TH1.h>
#include <TRandom.h>

void test() {
  // dummy data
  TH1D h("h", "", 5, 0, 5);
  h.Sumw2();
  gRandom->SetSeed(666);
  h.FillRandom("pol1", 100000);

  // 1: TH1::Fit
  TF1 fun1("fun1", "[0]*x", 0, 10);
  fun1.SetParameter(0, 0);
  gRandom->SetSeed(666);
  h.Fit("fun1", "Q0");
  std::cout << fun1.GetParameter(0) << " " << fun1.GetParError(0) << "\n";

  // 2: TGraphErrors::Fit
  TF1 fun2("fun2", "[0]*x", 0, 10);
  TGraphErrors g(&h);
  fun2.SetParameter(0, 0);
  gRandom->SetSeed(666);
  g.Fit("fun2", "Q0");
  std::cout << fun2.GetParameter(0) << " " << fun2.GetParError(0) << "\n";

  // check if graph and histogram are compatible
  for (int i=1; i<h.GetNbinsX()+1; ++i) {
    std::cout << h.GetBinContent(i) - g.GetY()[i - 1] << " "
              << h.GetBinError(i) - g.GetEY()[i - 1] << std::endl;
  }
}

I get the following output

$ root -b -q -n -l test.C 
root [0] 
Processing test.C...
7712.1 24.8388
7574.54 595.299
0 0
0 0
0 0
0 0
0 0

So the histogram and the graph seem identical content-wise, but the fitted values of the coefficient are completely different. I had thought that the same data should produce the same fit in both cases.

This appears to be neither coming from the global state of the random number generator or the ordering in which I do the fits.

Is this expected to be different? What do I need to do to have consistent fit results, no matter how I store my data?

Thanks,

Benjamin

Danilo · January 29, 2014, 11:34am

Hi Benjamin,

this is because the constructor on of TGraphErrors from an histograms attaches an error on X as well which is equal to half the width of the bin (notice the difference in the uncertainty on your fit parameter). You can see this using the TGraphErrors::Print() method.

Cheers,
Danilo

honk · January 29, 2014, 1:21pm

Hi Danilo,

with that explanation I understand that my premise was wrong that the TH1 and TGraphErrors were equivalent: the binwidth of a histogram is not the same as a Gaussian uncertainty in x on the points.

I was able to obtain the same results when fitting the TGraphErrors with option EX0 so it looks like fits of histograms don’t take the binwidth into account by default. Using the integral option for the histogram takes the binwidth into account, but of course these are still not Gaussian uncertainties so fit result differ wrt a TGraphErrors.

My take-away from this is that one needs to carefully think about x-uncertainties when fitting points from histogrammed data which might be in TH1 or TGraphErrors form (think TGraphError::Read and intermediate text files).

moneta · February 3, 2014, 12:05pm

Hi Benjamin,

The think you have to be careful is that when constructing a TGraphErrors from a TH1 an error in the x coordinate is added automatically and this error is equal to half of the bin width. I think this is debatable,
since the bin width does not really represent an uncertainty in x as you mentioned.

During the histogram fit, the uncertainties in x are never taken into account, because an histogram does not represent two distinct measurements x and y, but the empirical distribution of a variable x.
When fitting to estimate the function contribution in each bin, you can use the central function value or
the integral of the function in the bin (option “I”). The first one is an approximation, while the second is the correct treatment if the function has significant non-null second derivatives in the bin (e.g. in case of a very steep exponential function).

Lorenzo

honk · February 3, 2014, 12:27pm

Hi Lorenzo,

thanks for commenting.

[quote=“moneta”]I think this is debatable, since the bin width does not really represent an uncertainty in x as you mentioned.
[/quote]
I actually kind of like that TGraphErrors are able to keep information about the histogram used to generate them.

Is there a way to fit a TGraphErrors with non-zero “x-uncertainties” (e.g. from bin widths) with the integral option? In that case in wouldn’t matter how the data was stored, and one could deviate from the current reasonable defaults depending on what type of x-uncertainty information a TGraphErrors contains. Not sure that is a common use case thought (and it might be tricky since TGraphErrors don’t necessarily cover the x-axis like TH1s do).

moneta · February 3, 2014, 1:08pm

No, this is not currently possible in ROOT. But, let me understand better and if this makes sense. What exactly are your data points x and y ?

Best Regards

Lorenzo

honk · February 3, 2014, 9:38pm

My data is from histograms, but often saved in CSV files (e.g. since it is easier to exchange with non-ROOT users). The columns of these files typically correspond to x, x-binwidth, y, dy_stat and maybe dy_syst. I always tended to use TGraphErrors to read in these text files since it can be constructed from such files, but maybe that is just a bad habit.

moneta · February 5, 2014, 4:42pm

Hi,

If your data is an histogram you should always use the option “EX0” to fit. There is no uncertainty to consider in the x coordinate. I would however convert the graph to an histogram (you could then use the “I” option)

Cheers

Lorenzo

ccorti · June 20, 2014, 2:17pm

[quote=“moneta”][quote]
Is there a way to fit a TGraphErrors with non-zero “x-uncertainties” (e.g. from bin widths) with the integral option?
[/quote]
No, this is not currently possible in ROOT. But, let me understand better and if this makes sense. What exactly are your data points x and y ?
[/quote]

I tried to do something similar, I posted my attempts here Fitting a TGraphAsymmErrors with the integral option but received no answer.