Chi-square test between two histograms

federica · September 21, 2019, 4:39pm

Hi all,

I made a c++ code generating some events following a specific angular distribution (that’s the Ffunction in my code). After that, I generate a second sample of events, which is following an isotropic distribution.

If I made the chi-square test between the two histograms, many times, and I get a p-value which is exactly equal to zero. I expected a really low p-value, of course, but something more than zero.

Is it correct? or there’s a bug in my code?


// c++ -o  es3 es3.cpp `root-config --cflags --glibs` 

#include<iostream>
#include<fstream>
#include<cstring>
#include<math.h>
#include<cmath>
#include<TMath.h>
#include<TCanvas.h>
#include<TGraph.h>
#include<TH2.h>
#include<TF1.h>
#include<TF2.h>
#include<TApplication.h>
#include<TGraphErrors.h>
#include<TMultiGraph.h>
#include<TLegend.h>
#include<TFitter.h>
#include<TLatex.h>
#include<TStyle.h>
#include<random>
#include <algorithm>
#include <TRandom3.h>
#include <TGraph2D.h>
#include <TMinuit.h>
#include <TFumili.h>

using namespace std;

//Angular distribution
double F(double x, double y) {

	double alpha = 0.65;
	double beta = 0.06;
	double gamma = -0.18;

	double firstParam = 0.5 * (1-alpha);
	double secondParam = 0.5 * (3*alpha - 1) * pow(cos(x), 2);
	double thirdParam = beta * pow(sin(x), 2) * cos(2*y);
	double fourthParam = sqrt(2) * gamma * sin(2*x) * cos(y);

	return (3/(4*M_PI)) * (firstParam + secondParam - thirdParam - fourthParam);
}

int main(int argc, char ** argv){
	
	gStyle -> SetOptFit(0111);
	
	TApplication * gr = new TApplication("grafica", 0, NULL);
	TCanvas *c = new TCanvas("c","c",600,400);

	/*
	 | **************************************************
	 | Generate a sample following the F distribution
	 | **************************************************
	*/
	
	random_device rd;  

    	mt19937 thetaGen(rd());
	mt19937 phiGen(rd()+1);

	//Random numbers between 0 and 360 degrees
    	uniform_real_distribution<> dis(0., 6.28319);

	mt19937 gen3(rd()+2);
	//Random numbers between 0 and 0.18 (the maximum of F)
	uniform_real_distribution<> disMax(0., 0.18);
	
	int nEventi = 500000;

	int nBin = 50;
	double min = 0;
	double max = 6.28319;

	TH2F *h2 = new TH2F("histo","Angular distribution F(#theta, #phi)", nBin, min, max, nBin, min, max);

	/*
	 | ******************************
	 | Hypothesis test: chi-square
	 | ******************************
	*/	

	//Generate the isotropic distribution
	mt19937 thetaIs_gen(rd()+3);
	mt19937 phiIs_gen(rd()+4);

	TH2F *hIs = new TH2F("h","Isotropic angular distribution", nBin, min, max, nBin, min, max);

	for (int iRep = 0; iRep < 10; ++iRep) {

		h2 -> Reset();
		hIs -> Reset(); 

		int iEv = 0;

		//Generate the first histogram
		while(iEv < nEventi) {

			//Random theta and phi
			double theta = dis(thetaGen);
			double phi = dis(phiGen);

			//Value of F
			double function = F(theta, phi);

			//Random number between 0 and the maximum of the function
			double check = disMax(gen3);
		
			if(check < function) {

				h2 -> Fill(theta, phi);
				iEv += 1;
			}
		}	

 		//Generate the second histogram (isotropic decay)
		for(int jEv = 0; jEv < nEventi; ++jEv) {

			double thetaIs = dis(thetaIs_gen);
			double phiIs = dis(phiIs_gen);

			hIs->Fill(thetaIs, phiIs);
		}

		double chiResult = h2 -> Chi2Test(hIs);
		cout << "Test number: " << iRep << " -> p-value: " << chiResult << endl;
	}

	return 0;
}

Wile_E_Coyote · September 21, 2019, 5:31pm

Try:

h2->Chi2Test(hIs, "p")

and see the printed “Chi2” and “NDF” (note: “p_value = TMath::Prob(Chi2, NDF)”).

BTW. Try also your code with “nEventi = 2000” (so that the “Chi2” is of the same order as the “NDF”).

federica · September 21, 2019, 5:49pm

That gives me:

(base) federica@linux-zw39:~/Documents/Analisi Statistica/Esercizi> ./es3
Chi2 = 79841.965187, Prob = 0, NDF = 2499, igood = 0
Chi2 = 79804.633839, Prob = 0, NDF = 2499, igood = 0
Chi2 = 79313.182716, Prob = 0, NDF = 2499, igood = 0
Chi2 = 79414.662140, Prob = 0, NDF = 2499, igood = 0
Chi2 = 79102.339486, Prob = 0, NDF = 2499, igood = 0
Chi2 = 79812.175452, Prob = 0, NDF = 2499, igood = 0
Chi2 = 79799.729334, Prob = 0, NDF = 2499, igood = 0
Chi2 = 80281.372433, Prob = 0, NDF = 2499, igood = 0
Chi2 = 79487.649367, Prob = 0, NDF = 2499, igood = 0
Chi2 = 79270.217622, Prob = 0, NDF = 2499, igood = 0

federica · September 21, 2019, 5:55pm

And if decrease the number of event I get a p-value which is close to zero, like 10^(-3) - 10^{-4), but higher than zero. Thanks!