Invert Nearly Singular Matrix

I’m encountering some covariance matrices with small determinants that I need to invert. Root either gives me the “Singular matrix” error or an inverse matrix that is incorrect. (Mathematica can find a good approximate inverse.)

Here’s an example of my problem in Python. As you will see the determinant is of the order 10^-8, small but not zero.

Thanks for any ideas you have to fix or work around this problem.

import sys, os, math, ROOT

matrix = ROOT.TMatrixD( 5, 5)
matrix[0][0] = 1.698670
matrix[0][1] = 1.581590
matrix[0][2] = 0.114995
matrix[0][3] = -0.153762
matrix[0][4] = 0.490576
matrix[1][0] = 1.581590
matrix[1][1] = 6.717900
matrix[1][2] = -0.456325
matrix[1][3] = 1.501070
matrix[1][4] = 1.160400
matrix[2][0] = 0.114995
matrix[2][1] = -0.456325
matrix[2][2] = 0.554749
matrix[2][3] = -0.253858
matrix[2][4] = -0.209215
matrix[3][0] = -0.153762
matrix[3][1] = 1.501070
matrix[3][2] = -0.253858
matrix[3][3] = 0.615728
matrix[3][4] = 0.215676
matrix[4][0] = 0.490576
matrix[4][1] = 1.160400
matrix[4][2] = -0.209215
matrix[4][3] = 0.215676
matrix[4][4] = 0.296860

print "\n-------------------------------\nDeterminant = %E\n-------------------------------\n" % matrix.Determinant()
matrix.Print()


matInv = ROOT.TMatrixD( matrix )
matInv.Invert().Print()

matInvFast = ROOT.TMatrixD( matrix )
matInvFast.Invert().Print()

svd = ROOT.TDecompSVD( matrix )
svd.Decompose()
svdInv = svd.Invert()
svdInv.Print()

Hi CreationOperator,

I took you code and made a C++ macro out of it:

{
  TMatrixD matrix(5,5);
  matrix[0][0] = 1.698670;
  matrix[0][1] = 1.581590;
  matrix[0][2] = 0.114995;
  matrix[0][3] = -0.153762;
  matrix[0][4] = 0.490576;
  matrix[1][0] = 1.581590;
  matrix[1][1] = 6.717900;
  matrix[1][2] = -0.456325;
  matrix[1][3] = 1.501070;
  matrix[1][4] = 1.160400;
  matrix[2][0] = 0.114995;
  matrix[2][1] = -0.456325;
  matrix[2][2] = 0.554749;
  matrix[2][3] = -0.253858;
  matrix[2][4] = -0.209215;
  matrix[3][0] = -0.153762;
  matrix[3][1] = 1.501070;
  matrix[3][2] = -0.253858;
  matrix[3][3] = 0.615728;
  matrix[3][4] = 0.215676;
  matrix[4][0] = 0.490576;
  matrix[4][1] = 1.160400;
  matrix[4][2] = -0.209215;
  matrix[4][3] = 0.215676;
  matrix[4][4] = 0.296860;

  std::cout << "\n-------------------------------\n Determinant = " << matrix.Determinant() << "\n-------------------------------\n";
  matrix.Print(",matrix");

  TMatrixD matInv(matrix);
  matInv.Invert().Print();
  matInv.Print("matInv");

  TDecompSVD svd(matrix);
  svd.Decompose();
  TMatrixD svdInv = svd.Invert();
  svdInv.Print("svdInv");

  TMatrixD unit = matrix*svdInv;
  unit.Print("matrix*svdInv");
}

The result (ROOT 5.10/00) is :

-------------------------------
 Determinant = -2.64163e-08
-------------------------------

5x5 matrix ,matrix is as follows

     |        0  |        1  |        2  |        3  |        4  |
------------------------------------------------------------------
   0 |      1.699       1.582       0.115     -0.1538      0.4906 
   1 |      1.582       6.718     -0.4563       1.501        1.16 
   2 |      0.115     -0.4563      0.5547     -0.2539     -0.2092 
   3 |    -0.1538       1.501     -0.2539      0.6157      0.2157 
   4 |     0.4906        1.16     -0.2092      0.2157      0.2969 


5x5 matrix  is as follows

     |        0  |        1  |        2  |        3  |        4  |
------------------------------------------------------------------
   0 |  -1.13e+06  -1.253e+05   1.186e+06  -8.128e+05   3.783e+06 
   1 | -1.253e+05   -1.39e+04   1.316e+05  -9.017e+04   4.197e+05 
   2 |  1.186e+06   1.316e+05  -1.245e+06   8.532e+05  -3.971e+06 
   3 | -8.128e+05  -9.017e+04   8.532e+05  -5.848e+05   2.722e+06 
   4 |  3.783e+06   4.197e+05  -3.971e+06   2.722e+06  -1.267e+07 


5x5 matrix matInv is as follows

     |        0  |        1  |        2  |        3  |        4  |
------------------------------------------------------------------
   0 |  -1.13e+06  -1.253e+05   1.186e+06  -8.128e+05   3.783e+06 
   1 | -1.253e+05   -1.39e+04   1.316e+05  -9.017e+04   4.197e+05 
   2 |  1.186e+06   1.316e+05  -1.245e+06   8.532e+05  -3.971e+06 
   3 | -8.128e+05  -9.017e+04   8.532e+05  -5.848e+05   2.722e+06 
   4 |  3.783e+06   4.197e+05  -3.971e+06   2.722e+06  -1.267e+07 


5x5 matrix svdInv is as follows

     |        0  |        1  |        2  |        3  |        4  |
------------------------------------------------------------------
   0 |  -1.13e+06  -1.253e+05   1.186e+06  -8.128e+05   3.783e+06 
   1 | -1.253e+05   -1.39e+04   1.316e+05  -9.017e+04   4.197e+05 
   2 |  1.186e+06   1.316e+05  -1.245e+06   8.532e+05  -3.971e+06 
   3 | -8.128e+05  -9.017e+04   8.532e+05  -5.848e+05   2.722e+06 
   4 |  3.783e+06   4.197e+05  -3.971e+06   2.722e+06  -1.267e+07 


5x5 matrix matrix*svdInv is as follows

     |        0  |        1  |        2  |        3  |        4  |
------------------------------------------------------------------
   0 |          1   2.857e-11  -6.803e-10    2.45e-10  -4.678e-10 
   1 | -6.082e-10           1   2.899e-10  -2.457e-10   1.767e-09 
   2 |  1.307e-10  -2.983e-12           1   1.962e-11  -2.483e-10 
   3 |  1.288e-11  -4.531e-12   6.767e-11           1   2.885e-10 
   4 | -6.119e-11   7.499e-12  -1.296e-10   9.492e-11           1 

That looks fine . What problem do you see in your python version ?

Eddy

I think that version worked because of the truncated values in the matrix. This version is more along the lines of what my C++ application is doing when it runs into problems. This version requires the attacted .root file.

[code]import os, sys, math, ROOT

def GetCovarianceMatrix( data ):

means = [] 
n = data.GetNrows()
d = data.GetNcols()
for i in range( n ):
	for j in range ( d ):
		try:
			means[j] += data[i][j]/n
		except IndexError:
			means.append(data[i][j]/n)

covMat = ROOT.TMatrixD( d, d )		
		
for i in range(n):
	for j in range(d):
		for k in range(d):
			covMat[j][k] += (means[j] - data[i][j])*(means[k] - data[i][k])/(n - 1.0)

return covMat

def GetTMatrixFromTree( tree ):
vars = [“InvMass”, “mMet”, “m1metDphi”, “m1m2Dphi” , “MuCentrali”]
#vars = [“InvMass”, “m1m2Dphi”, “mMet”]
mat = ROOT.TMatrixD(tree.GetEntries(), len(vars) )
print “Resized matrix . . .”
for i in range( 0, tree.GetEntries() ):
tree.GetEntry(i)
for j in range (len(vars)):
mat[i ][j] = tree.GetLeaf( vars[j] ).GetValue()

return mat

Get ROOT tree

rootFile = ROOT.TFile(“c:/temp/tree.root”)
cutTree = rootFile.Get(“dimu”)
print “%d events” % ( cutTree.GetEntries() )

#Get covariance matrix
treeMat = GetTMatrixFromTree( cutTree)
treeMat.Print()
A = GetCovarianceMatrix( treeMat )
A.Print(“Cov Matrix”)

AInv = ROOT.TMatrixD( A )
AInv.Invert()
AInv.Print()

unity = ROOT.TMatrixD( A )
unity *= AInv
unity.Print(“UNITY”)[/code]
tree.root (14.1 KB)

Using your tree.root file, I have created a script imatrix.C
reproducing Eddy’s style. I have run this script under linux and
windows with identical results (within machine precision).
I also ran imatrix.py (your script) on the two systems and I
obtain substantially different results.
I do not know if this is due to the different algorithms
in your respective code (no time to check).
See the results in the small tar file in attachement.

Rene
imatrix.tar.gz (2.36 KB)

Hi Rene,

I’ve modified the C++ code to calculate the covariance matrix. It then attempts to invert the covariance matrix and gets the same results as the python script.

void GetCovarianceMatrix( const TMatrixD &data, TMatrixD &covMat ){
   
   Int_t n = data.GetNrows();
   Int_t d = data.GetNcols();
   Double_t *means = new Double_t[d];

   for(Int_t i = 0; i < n; ++i){
	   for( Int_t j =0; j < n; ++j){
			means[j] += data[i][j]/(Double_t)n;
	   }
   }

   covMat.ResizeTo( d, d );     
   for(Int_t i = 0; i < n; ++i){
	   for( Int_t j =0; j < d; ++j){
		   for( Int_t k=0; k < d; k++){
			covMat[j][k] += (means[j] - data[i][j])*(means[k] - data[i][k])/(n - 1.0);
		   }
	   }
   }         


	delete []means;
}
         

void imatrix() {

   TFile *f = new TFile("tree.root");
   TTree *T = (TTree*)f->Get("dimu");
   const Int_t nl = 5;
   char *leaves[nl] = {"InvMass", "mMet", "m1metDphi", "m1m2Dphi" , "MuCentrali"}; 
   TLeaf *leaf[nl];
   for (Int_t i=0;i<nl;i++) leaf[i] = T->GetLeaf(leaves[i]);
   
   Int_t N = (Int_t)T->GetEntries();
   TMatrixD matrix(N,nl);
   for (Int_t entry=0;entry<N;entry++) {
      for (Int_t i=0;i<nl;i++) {
         leaf[i]->GetBranch()->GetEntry(entry);
         matrix[entry][i] = leaf[i]->GetValue();
      }
   }
   matrix.Print();


   TMatrixD covMatrix;
   

   GetCovarianceMatrix(matrix, covMatrix);
   covMatrix.Print();

   TMatrixD matInv(covMatrix);
   matInv.Invert().Print();
   matInv.Print("matInv");

   //TDecompSVD svd(covMatrix);
   //svd.Decompose();
   //TMatrixD svdInv = svd.Invert();
   //svdInv.Print("svdInv");

   TMatrixD unit = covMatrix*matInv;
   unit.Print("covMatrix*matInv");
}
 

Thanks,
Dennis

I ran your imatrix code (C++ version of your last submission)
with your root file and saw that the
covariance file coming from your GetCovarianceMatrix is :

5x5 matrix covMatrix is as follows

     |        0  |        1  |        2  |        3  |        4  |
------------------------------------------------------------------
   0 |      1.699       1.582       0.115         nan     -0.1538 
   1 |      1.582       6.718     -0.4563         nan       1.501 
   2 |      0.115     -0.4563      0.5547         nan     -0.2539 
   3 |        nan         nan         nan         nan         nan 
   4 |    -0.1538       1.501     -0.2539         nan      0.6157 

Inverting this matrix will of course not work .

The reason is that you forget to initialize the array “means” to zero !
You also forget to do this with the covMat matrix .

Eddy

OK - the C++ version I hastily translated from python this morning has problems. Can you initialize the values and try again or run this attached version? (I created the original python script in order to isolate the problem and make it easier for others to reproduce. Sorry this backfired.)

It’s not my real problem though. The real problem is that ROOT is not properly inverting the matrix covMatrix. (You already showed that it can invert the low precision version of covMatrix when the values in the matrix are converted strings rather than calculated double values. )

I’ve actually tried to isolate a problem which is occurring in data analysis. Some of the barely contributing backgrounds end up with small data samples and covariance matrices that won’t invert. This is one example.

Here are my results when I run both the C and the Python versions of the code.

root [0]
Processing imatrix.C...

5x5 matrix Data Matrix is as follows

     |        0  |        1  |        2  |        3  |        4  |
------------------------------------------------------------------
   0 |      11.03       5.951      0.5151       1.085       1.971
   1 |      14.71       9.344      0.7236       2.165       1.658
   2 |      12.97       7.364       2.264       1.005       1.231
   3 |      12.88       11.82       1.116       1.919       3.248
   4 |      12.83       5.577       1.854       1.096       1.528


5x5 matrix covMatrix is as follows

     |        0  |        1  |        2  |        3  |        4  |
------------------------------------------------------------------
   0 |      1.699       1.582       0.115      0.4906     -0.1538
   1 |      1.582       6.718     -0.4563        1.16       1.501
   2 |      0.115     -0.4563      0.5547     -0.2092     -0.2539
   3 |     0.4906        1.16     -0.2092      0.2969      0.2157
   4 |    -0.1538       1.501     -0.2539      0.2157      0.6157


5x5 matrix matInv is as follows

     |        0  |        1  |        2  |        3  |        4  |
------------------------------------------------------------------
   0 | 1.145e+015   1.27e+014 -1.202e+015 -3.834e+015  8.237e+014
   1 |  1.27e+014  1.408e+013 -1.333e+014 -4.252e+014  9.136e+013
   2 |-1.202e+015 -1.333e+014  1.261e+015  4.024e+015 -8.646e+014
   3 |-3.834e+015 -4.252e+014  4.024e+015  1.284e+016 -2.758e+015
   4 | 8.237e+014  9.136e+013 -8.646e+014 -2.758e+015  5.926e+014


5x5 matrix covMatrix*matInv is as follows

     |        0  |        1  |        2  |        3  |        4  |
------------------------------------------------------------------
   0 |     0.6563   -0.003906      -0.375        -0.5      0.1719
   1 |         -1       1.016       -0.25           1        0.25
   2 |      -0.25    -0.01563       1.094         0.5    -0.09375
   3 |    -0.1875   -0.007813     -0.1563       1.125     0.01563
   4 |     -0.125   -0.007813      0.0625        0.25           1

root [1]

These results are the same as I get from the python script.
imatrix.C (1.64 KB)

I modified your script so that everything is initialized properly and I can
run now your script too :

void GetCovarianceMatrix( const TMatrixD &data, TMatrixD &covMat ){

   Int_t n = data.GetNrows();
   Int_t d = data.GetNcols();
   Double_t *means = new Double_t[d];
   memset(means,0,d*sizeof(Double_t));

   for(Int_t i = 0; i < n; ++i){
      for( Int_t j =0; j < n; ++j){
         means[j] += data[i][j]/(Double_t)n;
      }
   }

   covMat.ResizeTo( d, d );
   covMat = 0;
   for(Int_t i = 0; i < n; ++i){
      for( Int_t j =0; j < d; ++j){
         for( Int_t k=0; k < d; k++){
         covMat[j][k] += (means[j] - data[i][j])*(means[k] - data[i][k])/(n - 1.0);
         }
      }
   }


   delete []means;
}


void imatrix() {

   TFile *f = new TFile("tree.root");
   TTree *T = (TTree*)f->Get("dimu");
   const Int_t nl = 5;
   char *leaves[nl] = {"InvMass", "mMet", "m1metDphi", "m1m2Dphi" , "MuCentrali"};
   TLeaf *leaf[nl];
   for (Int_t i=0;i<nl;i++) leaf[i] = T->GetLeaf(leaves[i]);

   Int_t N = (Int_t)T->GetEntries();
   TMatrixD matrix(N,nl);
   for (Int_t entry=0;entry<N;entry++) {
      for (Int_t i=0;i<nl;i++) {
         leaf[i]->GetBranch()->GetEntry(entry);
         matrix[entry][i] = leaf[i]->GetValue();
      }
   }
   matrix.Print("matrix");


   TMatrixD covMatrix;

   GetCovarianceMatrix(matrix, covMatrix);
   covMatrix.Print("covMatrix");
   
   TMatrixD matInv(covMatrix); 
   matInv.Invert().Print("matInv1");
   matInv.Print("matInv2");
   
   TDecompSVD svd(covMatrix);
   svd.Decompose();  
   TMatrixD svdInv = svd.Invert();
   svdInv.Print("svdInv");  
   cout << "condition: " << svd.Condition() << endl;
   
   TMatrixD unit = covMatrix*svdInv;
   unit.Print("covMatrix*svdInv");
}

I have activated again the Singular Value decomposition which is the
most precise inversion in the ROOT package . The result of
covMatrix*svdInv is

5x5 matrix covMatrix*svdInv is as follows

     |        0  |        1  |        2  |        3  |        4  |
------------------------------------------------------------------
   0 |     0.9278   -0.008011     0.07582      0.2419    -0.05197 
   1 |  -0.008011      0.9991     0.00841     0.02683   -0.005764 
   2 |    0.07582     0.00841      0.9204     -0.2539     0.05455 
   3 |     0.2419     0.02683     -0.2539      0.1901       0.174 
   4 |   -0.05197   -0.005764     0.05455       0.174      0.9626 

So is this what one could expect ? Yes, it is .
The relevant quantity to see what the precision of an inversion will
be is the Condtion number (see User’s Guide) . (It is not the
determinant . Just take for example a unit matrix and scale the
diagonal with an arbitrary small number … )

The condition number of your matrix is : 1.56019e+17

The rule of thumb is that for double precision, a condition number
of 10^n will result in an accuracy of 15-n digits in the inverted matrix .

Mathematica is probably using some iterative scheme to calculate
the inversion . Our current package has not this option ,however, the
building blocks to set this up are there …

Eddy