How the TMVA PDF makes the KDE estimation from histograms

mverissi · September 15, 2021, 4:45pm

Dear experts,

I know that KDE estimation uses unbinned data, but the TMVA uses a histogram (TH1F). I want to know how TMVA deals with the histogram, I mean they get the histogram and resample unbinned data according to the histogram (just like in the piece of code that I put here) or there is another approach to make the KDE estimation?

import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as sts

n = 100000

# generate some random multimodal histogram data
samples = np.concatenate([np.random.normal(np.random.randint(-8, 8), size=n)*np.random.uniform(.4, 2) for i in range(4)])
h,e = np.histogram(samples, bins=100, density=True)
x = np.linspace(e.min(), e.max())

# plot the histogram
plt.figure(figsize=(8,6))
plt.bar(e[:-1], h, width=np.diff(e), ec='k', align='edge', label='histogram')

# plot the real KDE
kde = sts.gaussian_kde(samples)
plt.plot(x, kde.pdf(x), c='C1', lw=8, label='KDE')

# resample the histogram and find the KDE.
resamples = np.random.choice((e[:-1] + e[1:])/2, size=n*5, p=h/h.sum())
rkde = sts.gaussian_kde(resamples)

# plot the KDE
plt.plot(x, rkde.pdf(x), '--', c='C3', lw=4, label='resampled KDE')
plt.title('n = %d' % n)
plt.legend()
plt.show()

Thanks!

bellenot · September 16, 2021, 6:09am

Welcome to the ROOT Forum! Maybe @moneta can help you with this

moneta · September 28, 2021, 3:28pm

Hi,
Sorry for the late reply. I think TMVA builds using the un-binned data set an histogram using the kernel estimation. .Basically it computes the kernel density estimators only in a grid ( the bin centres) and then uses an interpolation ( I think linear) for the other points.

Cheers

Lorenzo