Doubt about how to calculate the covariance matrix from a txt file

The main objective of the algorithm is to read a txt file with the noise data information and calculate the covariance matrix and save it into an output file.

The program is:

#include <iostream>
#include <fstream>
#include <vector>
#include <sstream>
#include <TMatrixD.h>
#include <TPrincipal.h>

// Função para ler o arquivo e salvar os valores em uma matriz
std::vector<std::vector<double>> lerArquivo(const char* filename) {
    std::ifstream file(filename);
    std::vector<std::vector<double>> matriz;

    if (file.is_open()) {
        std::string linha;
        while (std::getline(file, linha)) {
            std::vector<double> valores;
            std::istringstream iss(linha);

            double valor;
            while (iss >> valor) {
                valores.push_back(valor);
            }

            matriz.push_back(valores);
        }

        file.close();
    } else {
        std::cout << "Erro ao abrir o arquivo." << std::endl;
    }

    return matriz;
}

// Função para calcular a matriz de covariância
TMatrixD calcularMatrizCovariancia(const std::vector<std::vector<double>>& dados) {
    const int nLinhas = dados.size();
    const int nColunas = dados[0].size();

    TPrincipal principal(nColunas, "D");
    for (int i = 0; i < nLinhas; i++) {
        const double* linha = &dados[i][0];
        principal.AddRow(linha);
    }

    principal.MakePrincipals();
    TMatrixD covMatrix = *(principal.GetCovarianceMatrix());

    return covMatrix;
}

// Função para salvar a matriz de covariância em um arquivo de texto
void salvarMatrizCovariancia(const TMatrixD& covMatrix, const char* filename) {
    std::ofstream file(filename);

    if (file.is_open()) {
        for (int i = 0; i < covMatrix.GetNrows(); i++) {
            for (int j = 0; j < covMatrix.GetNcols(); j++) {
                file << covMatrix(i, j) << " ";
            }
            file << std::endl;
        }

        file.close();
    } else {
        std::cout << "Erro ao abrir o arquivo para salvar a matriz de covariância." << std::endl;
    }
}

int main() {
    const char* filename = "RuidoOcupacao_100.txt";

    // Ler o arquivo e salvar os valores em uma matriz
    std::vector<std::vector<double>> dados = lerArquivo(filename);

    // Calcular a matriz de covariância dos dados
    TMatrixD covMatrix = calcularMatrizCovariancia(dados);

    // Salvar a matriz de covariância em um arquivo de texto
    const char* outputFilename = "matriz_covariancia.txt";
    salvarMatrizCovariancia(covMatrix, outputFilename);

    std::cout << "Matriz de covariância salva em: " << outputFilename << std::endl;

    return 0;
}

But the covariance matrix that I obtained is different in order of magnitude of the result I was expecting. I tried to chance the parameters “D” and “ND” but the result is the same. Somebody please have an idea how can I use the command GetCovarianceMatrix() in this case?

Welcome to the ROOT forum,

May be @moneta has an idea about it.

Did you check that the input data is what you expect through Principal::GetRow ?

In this case, I have as input data a txt file with 10000 lines. Each line has seven variables which represents the noise. Then, the dimension of the covariance matrix is 7 × 7.
In the definifition of TPrincipal, I placed the number of variables equal 7 (NColumns). And I used the for lace to add the 10000 rows information.

Do you know if my logic is correct?

I need to use the TPrincipal class or I can calculate the covariance matrix in an more direct way?

That all sounds good but an easy check would be to print a few rows just to check that it all works as expected.

I could give you a few lines of TMatrix code to do it but it will not beat the nice numerical way TPrincipal does it.

Would it be possible to attach the text file so that one could run your code ?

Thanks for helping me with the code. I’am sending the noise data of occupation 100 that use for the calculation of the covariance matrix.

(Attachment RuidoOcupacao_100.txt is missing)

The attachment is missing. If it is just 100 lines you also can copy/paste in the text.

Could you send me your email because I can’t add the file in this forum is too big neither the link for the drive.

edmondoffermann@yahoo.com

Did you receive the file that I sent you by email?

Here it is
RuidoOcupacao_100.txt.zip (2.5 MB)

The code to calculate the covariance matrix of your data using the TMatrix package is:

// covariance calculation using TMatrix package
TMatrixDSym CovMat(const std::vector<std::vector<double>>& dados) {
    const int nRow = dados.size();
    const int nCol = dados[0].size();

    TMatrixD x(nRow,nCol);;
    TVectorD vrow;
    for (int i = 0; i < nRow; i++)
    {
      vrow.Use(nCol,&dados[i][0]);
      TMatrixDRow(x,i) = vrow;
    }

    TMatrixD xdiff(nRow,nCol);
    for (int j = 0; j < nCol; j++)
    {
      TVectorD col_j = TMatrixDColumn(x,j);
      col_j -= col_j.Sum()/nRow;
      TMatrixDColumn(xdiff,j) = col_j;
    }

    const double scale = 1./nRow;
    TMatrixDSym covMatrix = scale*TMatrixDSym(TMatrixDSym::kAtA,xdiff);

    return covMatrix;
}

This code calculates it brute force, adding up the squares of the differences with the column average. So if your row entries are varying orders of magnitude, the smaller entries get “lost”. The TPrincipal::AddRow seems to do it numerically correct, however the price to be paid is that the matrix is being scaled by the number of entries (rows). It will result in the correct correlation matrix and principal values but the matrix is not the “true” covariance matrix.

@moneta , this should be considered a bug in GetCovariance() or be documented properly.

Now the program is working well. Thanks a lot for your help!