Chi2Test questions

Maarten_Boonekamp · March 17, 2021, 12:31pm

Dear all,

Sorry for the probably very basic questions.
See the simple macro below :

  TH1::SetDefaultSumw2();  
  TH1D* h1 = new TH1D("h1","h1",50,35,45);
  TH1D* h2 = new TH1D("h2","h2",50,35,45);
  for(int i=0; i<10000; i++) {
    h1->Fill(gRandom->Gaus(40.,2.));
    h2->Fill(gRandom->Gaus(40.,2.));
  }

Then

  cout << h1->Chi2Test(h2,"CHI2WW") << endl;
40.8678
  cout << h1->Chi2Test(h2,"CHI2UW") << endl;
41.7638

And

  cout << h2->Chi2Test(h1,"CHI2WW") << endl;
40.8678
  cout << h2->Chi2Test(h1,"CHI2UW") << endl;
40.463

My questions :

for this test with unweighted events, why do “UW” and “WW” return different numbers? I would assume that assuming weighted events with w_i==1 would fall back to to the unweighted case, by consistency.
why is the comparison between h1 and h2 different from that between h2 and h1, for “UW”?

In general, why shouldn’t we just always use “WW” (or remove these UU/UW/WW options)? Again, when w_i=1, the calculation should match the unweighted case by consistency.

A simple calculation, summing the square differences between the bin contents divided by the sum of the squares of the bin errors (using GetBinContent and GetBinError), always returns the same answer as “WW”, both with weighted and unweighted histograms

Sorry if I missed some obvious reasons, and thanks for your insights

Maarten

couet · March 17, 2021, 12:48pm

I would have thought you would have found the answers to your question in the Chi2Test documentation
If not, I guess @moneta can help you. And we may end up completing de documentation if necessary.

Maarten_Boonekamp · March 17, 2021, 1:11pm

I wrote here after reading that page, and the accompanying paper.
In short I would expect all given chi2 expressions to identify, in the limit where the event weights go to 1. Apparently this isn’t the case, so I wonder.

Also, I don’t understand why the chi2 calculation doesn’t commute.

(to be clear it’s not about documentation, but to check whether all this is expected by the experts, and to get some justification if possible)

couet · March 17, 2021, 1:27pm

So I let @moneta help you.

moneta · March 17, 2021, 2:50pm

Hi Maarten,

I think in the case of weighted histogram, one uses a slightly different formula. I should check the original paper for the details, but I think with the unweighted case one uses Poisson statistics, while in the weighted one assumes a normal distribution for the bin content. This might explain the small difference observed. This explains also why in the case of “UW” is not commuting. Option “UW” should be used when h1 is unweighted and h2 is weighted (in h1->Chi2Test(h2,UW)…
In the other two cases, UU and WW it is commuting.

Best regards

Lorenzo

system · March 31, 2021, 2:50pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.