I am fitting a weighted combination of test histograms to a histogram of measured data, and I am using a chi-squared fit to determine the weights in the weighted combination. However, I am not sure what the number of degrees of freedom in this problem should be.
For example, let us say that I have a measured histogram “hMeas” and I define a test histogram “hTest” as “hTest = ah1 + bh2 + ch3”, where a, b, and c are the scalar weights that I wish to recover and h1, h2, and h3 are the component histograms that represent contributions which could be present in hMeas. Furthermore, h1, h2, and h3 are all independent of each other. Finally, all of these histograms, including hMeas, have the same number of bins; for the sake of example, let’s say that they have 30 bins. The chi-squared fit compares the values in each bin between hMeas and hTest, so I have 30 calculated values that contribute to the overall chi-squared value.
In the example scenario I gave above, where I have 30 bins in the histograms and I am looking to recover 3 parameters by making hTest match hMeas as closely as possible, is it correct to say that I have 27 degrees of freedom (30 “measured bins” - 3 “fit parameters” = 27), or do I actually have 30 degrees of freedom because I am individually comparing all 30 histogram bins between hMeas and hTest?
It is like comparing a data histogram to a function .You will have 30 bins - 3 fit parameters = 27 number of degrees of freedom
Thanks for replying Lorenzo. I’m not sure that I understand this entirely, however, and I’m wondering what you think about the little thought experiment that I’ve outlined below.
Let us say that I actually have 20 template histograms, h1 to h20 with associated weights c1 to c20, and I still have 30 bins in each template histogram and in my measured histogram hMeas. If I now define hTest as the weighted sum of h1 to h20 and then perform my chi-squared fit of hTest to hMeas, which compares hMeas to hTest bin-by-bin, I think that I should expect a chi-squared value of approximately 30. I say this because my understanding is that each of the 30 bins should contribute a chi-squared value of approximately 1. However, if I’m supposed to have 30-20=10 degrees of freedom, then it is very unlikely to get a chi-squared value of 30 (a call to TMath::Prob(30,10) gives a right-sided p-value of 0.0009). Furthermore, if I expect my chi-squared “test statistic” to actually be distributed according to a chi-squared distribution with 10 degrees of freedom, wouldn’t that imply that each of the histogram bins has to contribute an average of 10/30 = 0.3 to the overall chi-squared count, lower than I should reasonably expect?
Any clarification that you could provide to this problem would be greatly appreciated!
If you have a combinations of many histograms, where each weight is a free parameter, you have a lot of freedom to make your resulting prediction,
hTest, so it is expected that the chi-squared will be smaller. In the limit you have 30 coefficients, you can make
hTest exactly equal to
hMeas, and your resulting chi-squared will be equal to zero.
However, in reality, the best way to study the correct distribution of your obtained chi-square is to generate pseudo-experiments, fit all of them and look at the obtained chi-square distribution.