I am trying to use a KDE method to estimate a histogram distribution non-parametrically. I have used the example https://root.cern.ch/root/html/tutorials/math/exampleTKDE.C.html and the primer https://arxiv.org/pdf/hep-ex/0011057v1.pdf to familiarize myself with the methods.
I am assuming “rho” in the example is the same as the smoothing parameter “h” in the primer, but I am not certain of that. Is that the case?
Further when I use the modified example script:
exampleTKDE_mod.C (2.31 KB)
I get inconsistent results. I’ve modified the sample script to have the scale of the histogram and parameters of the distribution depend on “xmin” and “xmax” only, and to use a Landau distribution, because that’s more similar to the data that I will ultimately use this functionality on.
The particular questions that come up are:
-
why does the estimate become worse when I change the scale to be 10e-6 as opposed to 10? Does it have to do with selecting a value of “1” for rho? It seems like the primer suggests that the optimum value for rho in at least one strategy for optimizing would be scale and number-of-data-points dependent, does that need to be made explicit before constructing the TKDE class?
-
even if I keep the scale of the calculation to be 10 I recover 0 for the KDE if I use more than say 10000 points. Why does that occur for my modified script and not the example script that I originally referenced?UPDATE: Looking at this post: [url]TKDE crash for more than 9999 data points I added the option “binning:unbinned” to my TKDE class and it seems to work better for the case of more than 10000 initial events now. What does this binning control? Why did the binning option not seem to matter for the original script with a bi-normal distribution but it did for my Landau distribution?
I am using ROOT v5.34/05