High value of Chi Square

Goodevening everybody,
I’ve some problems with chi-square. I wrote a macro that fit some experimental data with a sin function. The result is that the fitting function is perfect for the data, but in a way that i cannot undersantd Root returns me a very high value of Chi-Square. This is the macro:


#include "TCanvas.h"
#include "TStyle.h"
#include "TMath.h"

Double_t myfunction (Double_t *x, Double_t *par)
{
   Double_t Arg=x[0]+par[2];
   Double_t f = par[0]*TMath::Sin(par[1]*Arg);
   return f;

}

void myfunc(){
TF1 *woof= new TF1("woofer", myfunction,0,0.002,3);
woof->SetParameter(0,4.57);
woof->SetParameter(1, 6283.185);
woof->SetParameter(2,-0.0929);
woof->Draw();

TF1 *tweet= new TF1("tweeter", myfunction,0,0.002,3);
tweet->SetParameter(0,2.11);
tweet->SetParameter(1, 6283.185);
tweet->SetParameter(2,+1.0915);
tweet->Draw();

TF1 *sign= new TF1("signal", myfunction,0,0.002,3);
sign->SetParameter(0,4.59);
sign->SetParameter(1, 6283.185);
sign->SetParameter(2,-0.00001);
sign->Draw();
}

void fit(){
  TCanvas *c1 = new TCanvas("c1", "c1",346,57,700,500);
  TF1 *f1=(TF1*)gROOT->GetFunction("woofer");
  TF1 *f2=(TF1*)gROOT->GetFunction("tweeter");
  TF1 *f3=(TF1*)gROOT->GetFunction("signal");



  


TGraphErrors *woofer= new TGraphErrors("./Turno3bassaerrori.txt", "%lg%lg%lg");
TGraphErrors *tweeter= new TGraphErrors("./Turno3bassaerrori.txt", "%lg%*lg%*lg%lg%lg");
TGraphErrors *signal= new TGraphErrors("./Turno3bassaerrori.txt", "%lg%*lg%*lg%*lg%*lg%lg%lg");
woofer->GetXaxis()->SetLimits(0,0.002);
woofer->SetLineColor(12);
tweeter->SetLineColor(6);
signal->SetLineColor(1);
woofer->SetLineWidth(2);
tweeter->SetLineWidth(2);
signal->SetLineWidth(2);
woofer->Fit("woofer", "R", "QWEMR");
tweeter->Fit("tweeter", "R","QWEMR");
signal->Fit("signal", "R","QWEMR");
c1->cd();
woofer->SetTitle("Segnali a frequenza 1kHz");
woofer->Draw();
tweeter->Draw("SAME");
signal->Draw("SAME");
cout <<endl <<endl <<endl <<f1->GetChisquare()/f1->GetNDF() <<" " <<f2->GetChisquare()/f2->GetNDF() <<" " <<f3->GetChisquare()/f3->GetNDF() <<endl;

}

Here is the canvas with the function and the experimental data.

I’ll be very grateful if some of you will explain me the reason for this high Chi square value.
[EDIT] I tried to insert the same data in gnuplot and I’ve obtained Reduced Chi square in the order of the unit, so is this a ROOT problem in calculating it?

I assume you want:

woofer->Fit("woofer", "QWEMR");
tweeter->Fit("tweeter", "QWEMR");
signal->Fit("signal", "QWEMR");

I think it’s because root doesn’t divide expected value with the square of observed-expected result for each data point. Allow me to explain.

Formula to calculate chi square value is

\chi^2 = \sum\limits_{i}^{ } \frac{(Observed_i - Expected_i)^2}{Expected_i}

but what root does I think is

\chi^2 = \sum\limits_{i}^{ } {(Observed_i - Expected_i)^2}

It doesn’t divide squared term of observed-expected value for each data point with expected value. I don’t know why but I’m guessing maybe if an expected value(fitting value) is 0 then by dividing anything with 0 is undefined. Here’s a simple script to I have written to show what I mean.

void chisquare()
{

   int x[3] = {1, 3, 5}; 
   int y[3] = {2, 1, 9};

   TGraph *gr = new TGraph(3, x, y);
   gr->SetMarkerStyle(8);
   gr->Draw("apl");
   gr->Fit("pol1");

   vector<double> vec1;
   for (int i = 0; i < 3; i++)
   {
      double z = gr->GetFunction("pol1")->Eval(x[i]); // value at fit function
      vec1.push_back(z);
      cout << "The value of fit function at x = " << x[i] << " is " << z << endl;
   }

   double b = 0.0;
   double d = 0.0;

   for (int i = 0; i < 3; i++)
   {
      double a = (y[i]-vec1[i])*(y[i]-vec1[i]);
      b = b + a;
      double c = a/vec1[i];
      d = d + c;
   }

   double chisquare = gr->GetFunction("pol1")->GetChisquare();
   cout << "The value of chi square is accoring to root is " << chisquare << endl;
   cout << "Value of chisquare by manual calculation (without dividing expected term or fitting value) is " << b << endl;
   cout << "Value of chisquare by manual calculation (dividing expected term or fitting value) is " << d << endl;
}

If any root expert can shed some light into this that’d be awesome.

Hi Shiva,

Probably better not to open a 4-year old discussion, open a new topic instead
possibly referencing this discussion.

Now coming to your observation. You are mixing several things up.
Neither of your formulas are the definition of a chi-square, just special cases.

In a chi-square, each entry is weighted by 1/error^2. In case of counting statistics,
this happens to be equal to 1/expected.
In case the user does not supply errors in the observations, all errors are set to 1.

-Eddy

1 Like

Thanks @Eddy_Offermann for letting me know not to open older discussions. I didn’t know about this.
Coming back to discussion I’m a newbie in statistics. I did good old google search and what I found I mentioned earlier in my post. If you could suggest me a link or book where I can get some more idea about definition of chi-square test I’d greatly appreciate it.
Thanks again
-Shiva

Hi Shiva,
Bevington is still my favorite, it covers it all and was my first introduction to statistics/fitting/error analysis.

If you are more mathematically inclined, Mathematical Statistics by Bickel and Doksum

-Eddy

1 Like