Two weighted sample tests: Kolmogorov-Smirnov, Cramér-von Mises, Anderson-Darling tests

Hello,

I propose a generalization of Kolmogorov-Smirnov, Cramér-von Mises and Anderson-Darling homogeneity tests that can be used to weighted samples. The only tests of weighted samples in ROOT are TH1::Chi2test and TH1::KolmogorovTest. Generally, testing homogeneity of binned continuous data is not good (as you cannot accept original null hypothesis about unbinned distribution + get different p-values for different binning) and both tests mentioned above have their flaws. Therefore, I decided to contribute to ROOT with my code that is very similar to TMath::KolmogorovTest which can be applied to two unweighted samples.

Along the code with example of use I attached my poster from recent ACAT where details can be found with analysis of tests’ performance.

It was written in ROOT 6.04. It is not working when using ROOT 5 because of numeric integration libraries are not included in the old version.

Please tell me whether I should explain something which is unclear, modify the code or add something that is missing. I hope that my code can help you when determining homogeneity.

Best regards,
Jakub

trusina_acat2019.pdf (576.5 KB)
homtests.C (5.4 KB)

1 Like

Thank you, Jakub! @moneta could you review this, please, and discuss with Jakub (possibly outside this forum) if this is something useful for the community?

Hello Jakub,

Thank you very much for your interesting contribution. I think it will be good to include in ROOT, probably in the ROOT::Math::GoFTest, which contains the basic implementation for the Gof tests.
When applying to an histogram the weights can be interpreted as the bin content of the histogram.
Did you compare how your method performs in terms of p-value distribution for the null and power in case when the two samples are taken from two specific distributions ?

Then for inclusion in ROOT it would be nice if you could do a Pull Request om the ROOT::Math::GoFTest. But there are some issues of dependency, that we need to clarify. The one of TF1 is easy to remove it and it is not necessary. the integration algorithms can be called directly. Instead there is a dependency on the Bessel function which we have implemented only in the MathMore library. If this is needed we need to move that code in MathMore.
If you wish, we could discuss this more by e-mail

Best Regards

Lorenzo

1 Like

Hello, is there any update about this test being included in ROOT?
Or how can I quickly modify it for comparison of 2 histograms?

Thank you

Hi,
Can you maybe open a GitHub issue, tagging it with the label Improvement so it is not forgotten.

I would like to look in more details at the tests and run your code and we can then afterwards, if we decide to include this in ROOT, we can add a pull request

Sorry for the delay, but I still think this can be an interesting and useful contribution.

Best regards

Lorenzo

Sorry, it is not my code. @Jakub_Trusina started this thread.
I would just like to use it for my statistical analysis and I came upon this forum.

Sorry, I missed that. It would be good if you try using the code. The one posted here seems quite straightforward to use, since it requires as input just simple arrays.
It would be nice if you then let us know your feedback about the code

Best regards

Lorenzo