for this test with unweighted events, why do “UW” and “WW” return different numbers? I would assume that assuming weighted events with w_i==1 would fall back to to the unweighted case, by consistency.

why is the comparison between h1 and h2 different from that between h2 and h1, for “UW”?

In general, why shouldn’t we just always use “WW” (or remove these UU/UW/WW options)? Again, when w_i=1, the calculation should match the unweighted case by consistency.

A simple calculation, summing the square differences between the bin contents divided by the sum of the squares of the bin errors (using GetBinContent and GetBinError), always returns the same answer as “WW”, both with weighted and unweighted histograms

Sorry if I missed some obvious reasons, and thanks for your insights

I would have thought you would have found the answers to your question in the Chi2Test documentation
If not, I guess @moneta can help you. And we may end up completing de documentation if necessary.

I wrote here after reading that page, and the accompanying paper.
In short I would expect all given chi2 expressions to identify, in the limit where the event weights go to 1. Apparently this isn’t the case, so I wonder.

Also, I don’t understand why the chi2 calculation doesn’t commute.

(to be clear it’s not about documentation, but to check whether all this is expected by the experts, and to get some justification if possible)

I think in the case of weighted histogram, one uses a slightly different formula. I should check the original paper for the details, but I think with the unweighted case one uses Poisson statistics, while in the weighted one assumes a normal distribution for the bin content. This might explain the small difference observed. This explains also why in the case of “UW” is not commuting. Option “UW” should be used when h1 is unweighted and h2 is weighted (in h1->Chi2Test(h2,UW)…
In the other two cases, UU and WW it is commuting.