Hello all,

I am working on a data analysis project and I was looking at a correlation matrix between many different variables. Some variables are completely flat, filled with only a single value, and I notice they give funny correlations. The 2D histogram between flat variables is only a single bin, and the correlation factor returned by such histograms fluctuates wildly.

I found that the GetCorrelationFactor method relies on the GetStdDev method, which I think should return zero for a 1 bin histogram, but does not. So I wrote a script to investigate this further, and I am confused about what I have found. Maybe there is an easy answer, but if not I have attached my file and working script.

The script is below and for convenience the comment output, as well.

Any insight is appreciated! It would be nice if the correlations would either return 1 or 0, and not fluctuate.

-Julian

Processing testScript.C…

Correlation factor returns 0.08. For other 1-bin histograms, this may return >1.0:

CF = 0.089672

GetStdDev returns nonzero. This also fluctuates with different 1 bin histograms:

GetStdDev(1) = 0.000643407

GetStdDev(2) = 0.000306698 <–

Use GetStats() function to re-calculate GetStdDev:

stats1[0] (sumw) = 4032

stats1[1] (sumw2) = 4032

stats1[2] (sumwx) = 7.93209e+06

stats1[3] (sumwx2)= 1.56047e+10

stats1[4] (sumwy) = 6.2331e+06

stats1[5] (sumwy2) = 9.63578e+09

stats1[6] (sumwxy) = 1.22623e+10

mean_x = 1967.29

mean_y = 1545.91

stats[4] behaves as it should, when divided by sumw, it returns mean y:

stats[4]/stats[0] = 1545.91

This should return zero, but doesn’t!:

stats[5]/stats[0] - mean_y*mean_y = -9.40636e-08
Successfully reproduced GetStdDev(), using GetStats() and GetMean():
sqrt(abs(stats[5]/stats[0] - mean_y*mean_y)) = 0.000306698 <–

Why doesn’t stats[5] work? Recalculate with bin loop (overkill for single bin!)

I realize that GetStats uses GetBinCenter, not GetMean, and they differ.

binx = 21

biny = 20

x = 1967.2787499999999454

y = 1545.9212499999998727

w = 4032

MeanX = 1967.2850341796875

MeanY = 1545.9066162109375

Value of mystats[5] differs from GetStats() value:

mystats[5] = 9635965965.1646976471

My version of GetStdDev returns zero: …

sqrt(abs(stats[5]/stats[0] - myy

*myy)) = 0*

Calculated from GetStats() and the mean y value, this matches GetStdDev()

sqrt(abs(stats[5]/stats[0] - meanymeany)) = 0.00030669786441408965899

Calculated from GetStats() and the mean y value, this matches GetStdDev()

sqrt(abs(stats[5]/stats[0] - meany

testScript.C (4.01 KB)

testfile.root (3.94 KB)