Home | News | Documentation | Download

Two weighted sample tests: Kolmogorov-Smirnov, Cramér-von Mises, Anderson-Darling tests

Hello,

I propose a generalization of Kolmogorov-Smirnov, Cramér-von Mises and Anderson-Darling homogeneity tests that can be used to weighted samples. The only tests of weighted samples in ROOT are TH1::Chi2test and TH1::KolmogorovTest. Generally, testing homogeneity of binned continuous data is not good (as you cannot accept original null hypothesis about unbinned distribution + get different p-values for different binning) and both tests mentioned above have their flaws. Therefore, I decided to contribute to ROOT with my code that is very similar to TMath::KolmogorovTest which can be applied to two unweighted samples.

Along the code with example of use I attached my poster from recent ACAT where details can be found with analysis of tests’ performance.

It was written in ROOT 6.04. It is not working when using ROOT 5 because of numeric integration libraries are not included in the old version.

Please tell me whether I should explain something which is unclear, modify the code or add something that is missing. I hope that my code can help you when determining homogeneity.

Best regards,
Jakub

trusina_acat2019.pdf (576.5 KB)
homtests.C (5.4 KB)

1 Like

Thank you, Jakub! @moneta could you review this, please, and discuss with Jakub (possibly outside this forum) if this is something useful for the community?

Hello Jakub,

Thank you very much for your interesting contribution. I think it will be good to include in ROOT, probably in the ROOT::Math::GoFTest, which contains the basic implementation for the Gof tests.
When applying to an histogram the weights can be interpreted as the bin content of the histogram.
Did you compare how your method performs in terms of p-value distribution for the null and power in case when the two samples are taken from two specific distributions ?

Then for inclusion in ROOT it would be nice if you could do a Pull Request om the ROOT::Math::GoFTest. But there are some issues of dependency, that we need to clarify. The one of TF1 is easy to remove it and it is not necessary. the integration algorithms can be called directly. Instead there is a dependency on the Bessel function which we have implemented only in the MathMore library. If this is needed we need to move that code in MathMore.
If you wish, we could discuss this more by e-mail

Best Regards

Lorenzo

1 Like