Covariance matrix decomposition

Antonio83 · May 12, 2021, 4:32pm

Dear experts,

I have a 2D histogram (10 bins on both axes) and the corresponding covariance matrix for all bins. From these I am trying to compute varied histograms, i.e. histograms in which the bin contents varies according to the covariance matrix.

I wanted to do this using the Cholesky decomposition, but since the covariance matrix can be positive semidefinite, I was trying to use the SVD (if I understand correctly, this would be like a PCA analysis, in which only the variables with non-zero eigenvalue give contribution).

The problem I am facing is that when I try to compute the eigenvalues of the covariance matrix, some of the values are very small, but still not zero (some values are <0, and I have also some off-diagonal elements). For instance:

   |       90    |       91    |       92    |       93    |       94    |
----------------------------------------------------------------------
        ...........  ..........    ...........   ..........   ..........
  89 |          0           0           0           0           0 
  90 | -1.477e-18   1.812e-18           0           0           0 
  91 | -2.297e-18  -4.337e-19           0           0           0 
  92 |          0           0  -4.337e-19   2.297e-18           0 
  93 |          0           0           0   2.202e-18           0 
  94 |          0           0           0           0  -3.586e-19 
  95 |          0           0           0           0   -1.43e-18 
  96 |          0           0           0           0           0

I guess this is something related to the numerical precision of the computation, and makes the machinery not working: by decomposing with SVD, Cov = U* S*V^T, U and V are not equal.

Are there any suggestions on how to proceed? Is there something I did not take into account? Maybe what I am trying to do can not work given the large number of variables?

I attach an eample of covariance matrix I am using.

Thanks,
Antonio

cov_matrix_example.root (59.6 KB)

_ROOT Version: 6.22/06

Eddy_Offermann · May 12, 2021, 5:37pm

Hi Antonio,

Your matrix has a very large condition number of 10^18 !
Looking at the matrix where diagonal and off-diagonal numbers are nearly identical (implying a nearly 100% correlation between all the bins), I wonder if the covariance was calculated correctly.

Anyhow, please see below a small script that was used to calculate the
condition number and an example to perform a decomposition and check whether it succeeded.

-Eddy

{
  TFile f("cov_matrix_example.root");
  TMatrixD cov = *(TMatrixD *)f.Get("cov");

  TDecompSVD svd(cov);
  std::cout << "condition number: " << svd.Condition() << std::endl;

  TVectorD sig = svd.GetSig();
  sig.Print();

  TDecompChol chol(cov);
  if (!chol.Decompose())
  {
    std::cout << "Cholesky decomposition failed" << std::endl;
    exit(0);
  }
}

Eddy_Offermann · May 12, 2021, 5:43pm

I have no idea what produces your covariance data but given the results you might want to consider to study the “difference” instead of the value. In finance one would not model the prices but rather the price moves.

Antonio83 · May 13, 2021, 8:38am

Hi Eddy,

thanks for your feedback.

Looking at the matrix where diagonal and off-diagonal numbers are nearly identical (implying a nearly 100% correlation between all the bins), I wonder if the covariance was calculated correctly.

right! I did not realize that at first sight. So basically, if the covariance matrix is correct, this would mean that I can just shift all the bins content up / down within their errors to produce varied histograms.

To get the covariance matrix, I computed N variation of the histogram (by changing some parameters of a model) and then: cov_ij = Sum_ij (1/N)(h_mu_i - h_var_i)*(h_mu_j - h_var_j).

I have no idea what produces your covariance data but given the results you might want to consider to study the “difference” instead of the value. In finance one would not model the prices but rather the price moves.

I think I did not understand what you mean. Could you give me some example?

Thanks,
Antonio

Eddy_Offermann · May 13, 2021, 12:20pm

Hi Antonio,

If I understand correctly you have a 2-dim histogram with entry h_var(i, j)
for bin (i, j). These contents depend on some hidden variables x_k, k = 0,n-1.

You now vary these variables and calculate the correlations between the
contents at bin (i, j) for the histograms produced for all these variations.

You observe that you get small variations in the bin contents and not surprisingly end up with a correlation matrix have basically all entries set to 1.

My suggestion is not to calculate the covariance matrix between the cell contents h_var_i and h_var_j but rather between

h_var_(i+1)-h_var_i and h_var_(j+1)-h_var_j.

Since I do not know what exactly your model is and what the bins represent, I can not tell is this is meaningful. In finance the bins would be steps in time.

-Eddy

Antonio83 · May 13, 2021, 5:24pm

Hi Eddy,

If I understand correctly you have a 2-dim histogram with entry h_var(i, j)
for bin (i, j). These contents depend on some hidden variables x_k, k = 0,n-1.

You now vary these variables and calculate the correlations between the
contents at bin (i, j) for the histograms produced for all these variations.

yes, right. This is what I am doing.

You observe that you get small variations in the bin contents and not surprisingly end up with a correlation matrix have basically all entries set to 1.

so, the bin contents are weights, with values around 1. According to the covariance matrix, the standard deviation is ~10% for the bin contents.
I attach the correlation matrix in a new file.

My suggestion is not to calculate the covariance matrix between the cell contents h_var_i and h_var_j >but rather between

h_var_(i+1)-h_var_i and h_var_(j+1)-h_var_j.

Since I do not know what exactly your model is and what the bins represent, I can not tell is this is meaningful. In finance the bins would be steps in time.

in my case the bin contents represent reweighting factors for some Monte Carlo simulation, so I am interested to check how the variation of the weights within their errors impacts on the simulation.

I don’t know if your suggestion could be applied here: I need to think about it.
Maybe, given the high correlations, I can just shift all the weights toward the same direction. So, what I was trying to do, taking into account correlations for the variations, has not much sense in this particular case.

Thanks,
Antonio

cor_matrix_example.root (58.4 KB)

system · May 27, 2021, 5:25pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.