Kolmogorov Test on two histograms with the "X" option


I am having some trouble understanding the output of the Kolmogorov test for TH1’s when used with the pseudo experiment option (“X”).

I have written a simple macro that generates 1000 pairs of histograms all with 5000 events and drawn from Gaussian distributions with a mean of 0 and a sigma of 0.5. I wanted to test how effectively the Kolmogorov test as implemented in TH1 could distinguish differences between the histogram pairs after I start making changes to one of the two gaussian shapes.

What I noticed is that the test and the test done using MC toys i.e. the hist1->KolmogorovTest(hist2,“X”) option give very different outcomes for histogram pairs generated from identical pdfs.

I have attached two histograms showing the distribution of the KS test result. The first is the default method (ks_1.pdf) the second is using the toy method (ks_2.pdf). In neither case does the distribution look uniform. But also the two seem to be telling me opposite things. In one the statistic is peaked at 0 and the other at 1.

Is this a problem with my binning? Or am I just misunderstanding what the “X” option output means? My generated histograms have a binning of 400 bins between -50 to 50.
ks_1.pdf (14.1 KB)
ks_2.pdf (14.1 KB)


hist1->KolmogorovTest(hist2,“X”) does not return the Kolmogorov probability, but returns the fraction of pseudo-experiments with a distance larger than the one observed in the data.
So if the two histogram have a probability which is peaked at 1, it is normal that you have the majority pseudo-experiments with a distance very small.

The reason that the probability is not uniform is expected. The Kolmogorov test works for un-binned data sets and not for binned data. Now if you are using a very large bin size compared to the structure of your distribution you have this effect. Probably if you use 400 bins between -2.5 and 2.5 ( within 5 sigma) will be much better.
See NOTE 3 of root.cern.ch/root/html/TH1.html# … ogorovTest

Best Regards


The Kolmogorov-Smirnov test is defined for “event” data. If your bin contents are small, then your histogram can be approximated as such…but it’s up to you to make sure that that’s the case.