Scale and number dependence of TKDE estimate

villaa · November 25, 2016, 7:53pm

I am trying to use a KDE method to estimate a histogram distribution non-parametrically. I have used the example https://root.cern.ch/root/html/tutorials/math/exampleTKDE.C.html and the primer https://arxiv.org/pdf/hep-ex/0011057v1.pdf to familiarize myself with the methods.

I am assuming “rho” in the example is the same as the smoothing parameter “h” in the primer, but I am not certain of that. Is that the case?

Further when I use the modified example script:

exampleTKDE_mod.C (2.31 KB)

I get inconsistent results. I’ve modified the sample script to have the scale of the histogram and parameters of the distribution depend on “xmin” and “xmax” only, and to use a Landau distribution, because that’s more similar to the data that I will ultimately use this functionality on.

The particular questions that come up are:

why does the estimate become worse when I change the scale to be 10e-6 as opposed to 10? Does it have to do with selecting a value of “1” for rho? It seems like the primer suggests that the optimum value for rho in at least one strategy for optimizing would be scale and number-of-data-points dependent, does that need to be made explicit before constructing the TKDE class?
even if I keep the scale of the calculation to be 10 I recover 0 for the KDE if I use more than say 10000 points. Why does that occur for my modified script and not the example script that I originally referenced? UPDATE: Looking at this post: [url]TKDE crash for more than 9999 data points I added the option “binning:unbinned” to my TKDE class and it seems to work better for the case of more than 10000 initial events now. What does this binning control? Why did the binning option not seem to matter for the original script with a bi-normal distribution but it did for my Landau distribution?

I am using ROOT v5.34/05

moneta · December 5, 2016, 2:57pm

Hi,

See arxiv.org/pdf/hep-ex/0011057v1.pdf equation 7 for the definition of rho. It should be closer to 1.
Big variation of course will change a lot the result, in particular by using a very small value.

The binning controls the adaptive smoothing parameter h, which is, in case of binning pre-computed for all the given points, so there is no need to pass two times on the data when evaluating the KDE.

I guess you are using an incredible old version of ROOT, 5.34.05 and looking at the old posts, there were some bugs before which have ben fixed after 5.34.23 I think. Please try, if possible to use either ROOT 6 or the latter 5.34 version (5.34.36)

Best Regards

Lorenzo