TPrincipal fails for a very large number of entries

My ROOT version is 6.30/04. I am running OSX 14.4.1 with XCode 15.3 on an M1 driven MacBook Pro.

I found that if you provide the TPrincipal class with too many entries, it fails. Here is my test code.

#include <stdio.h>

void test_tprinc(Long_t num)
{
  
    TRandom r;
    TPrincipal covar(3);
    Double_t triplet[3];
    Long_t i;
    
    for (i=0; i<num; i++) {
        
        triplet[0]=r.Gaus(0.,1.);
        triplet[1]=4*triplet[0]+r.Gaus(0.,1.);
        triplet[2]=-3*triplet[1]+r.Gaus(0.,1.);
        covar.AddRow(triplet);
        
    }
    
    TMatrixD Mcovar = TMatrixD(3,3);;
    Mcovar = * covar.GetCovarianceMatrix();
 
    cout << "Covariance Matrix" << endl;
    cout << setw(12) << Mcovar(0,0) << setw(12) << Mcovar(0,1) << setw(12) << Mcovar(0,2) << endl;
    cout << setw(12) << Mcovar(1,0) << setw(12) << Mcovar(1,1) << setw(12) << Mcovar(1,2) << endl;
    cout << setw(12) << Mcovar(2,0) << setw(12) << Mcovar(2,1) << setw(12) << Mcovar(2,2) << endl;
}

And here is the output for various values of the argument “num” which determines the number of entries.

root [0] .L tprinc_test.cpp 
root [1] test_tprinc(100)
Covariance Matrix
    0.925195           0           0
      3.6724      15.327           0
    -10.9756    -45.7341     137.481
root [2] test_tprinc(100000)
Covariance Matrix
    0.998668           0           0
     3.99162     16.9557           0
    -11.9761    -50.8703     153.622
root [3] test_tprinc(100000000)
Covariance Matrix
     0.99999           0           0
     3.99987      16.999           0
    -11.9997    -50.9972     153.993
root [4] test_tprinc(10000000000)
Error in <TVectorT<double>::Allocate>: nrows=-1382938632

 *** Break *** segmentation violation
[/usr/lib/system/libsystem_platform.dylib] _sigtramp (no debug info)
[/opt/homebrew/Cellar/root/6.30.04/lib/root/libHist.so] TPrincipal::AddRow(double const*) (no debug info)
[<unknown binary>] (no debug info)
[<unknown binary>] (no debug info)
[/opt/homebrew/Cellar/root/6.30.04/lib/root/libCling.so] cling::IncrementalExecutor::executeWrapper(llvm::StringRef, cling::Value*) const (no debug info)
[/opt/homebrew/Cellar/root/6.30.04/lib/root/libCling.so] cling::Interpreter::RunFunction(clang::FunctionDecl const*, cling::Value*) (no debug info)
[/opt/homebrew/Cellar/root/6.30.04/lib/root/libCling.so] cling::Interpreter::EvaluateInternal(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, cling::CompilationOptions, cling::Value*, cling::Transaction**, unsigned long) (no debug info)
[/opt/homebrew/Cellar/root/6.30.04/lib/root/libCling.so] cling::Interpreter::process(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, cling::Value*, cling::Transaction**, bool) (no debug info)
[/opt/homebrew/Cellar/root/6.30.04/lib/root/libCling.so] cling::MetaProcessor::process(llvm::StringRef, cling::Interpreter::CompilationResult&, cling::Value*, bool) (no debug info)
[/opt/homebrew/Cellar/root/6.30.04/lib/root/libCling.so] HandleInterpreterException(cling::MetaProcessor*, char const*, cling::Interpreter::CompilationResult&, cling::Value*) (no debug info)
[/opt/homebrew/Cellar/root/6.30.04/lib/root/libCling.so] TCling::ProcessLine(char const*, TInterpreter::EErrorCode*) (no debug info)
[/opt/homebrew/Cellar/root/6.30.04/lib/root/libRint.so] TRint::ProcessLineNr(char const*, char const*, int*) (no debug info)
[/opt/homebrew/Cellar/root/6.30.04/lib/root/libRint.so] TRint::HandleTermInput() (no debug info)
[/opt/homebrew/Cellar/root/6.30.04/lib/root/libCore.so] TUnixSystem::CheckDescriptors() (no debug info)
[/opt/homebrew/Cellar/root/6.30.04/lib/root/libCore.so] TMacOSXSystem::DispatchOneEvent(bool) (no debug info)
[/opt/homebrew/Cellar/root/6.30.04/lib/root/libCore.so] TSystem::InnerLoop() (no debug info)
[/opt/homebrew/Cellar/root/6.30.04/lib/root/libCore.so] TSystem::Run() (no debug info)
[/opt/homebrew/Cellar/root/6.30.04/lib/root/libCore.so] TApplication::Run(bool) (no debug info)
[/opt/homebrew/Cellar/root/6.30.04/lib/root/libRint.so] TRint::Run(bool) (no debug info)
[/opt/homebrew/Cellar/root/6.30.04/bin/root.exe] main (no debug info)
[/usr/lib/dyld] start (no debug info)
Root > 

Evidently the number of rows is stored in a 32-bit integer. Or something similar to that.

[I would also remark that only the lower diagonal of the covariance matrix is computed by the class. This is not a “bug” (the matrix is symmetric) but probably a “user interface flaw”. If the user doesn’t know this detail, it will generate troubles.]

Hi Stephan,

Thanks for the post. I am sorry you are experiencing this behaviour, but would you perhaps clarify why this is a limitation for you?
In the meantime, I add in the loop @moneta .

Best,
Danilo

Dear Danilo

Thank you. I was using TPrincipal to understand correlations in a very large database, and the calculation crashed in the middle because of this issue. Of course, from a statistical point of view, I can use a sample of the population to determine the correlations, and that is what I have done. I am merely reporting the issue to the forum. I can understand if the developers are busy with other more pressing issues and decline to treat this one, since there exists a valid alternative workaround.

yours,
Steve

Hi Steve,

Thanks for the clarification. Glad to hear you are not blocked. We’ll look into this.

Cheers,
Danilo

It’s more than that. The class relies on TVectorT, which is based on 32-bit integers. So even if one changes the number of rows to 64-bit, the problem will still be there. So this looks like another flavor of Overcome 1GB size limit for IO buffers · Issue #6734 · root-project/root · GitHub

I opened [hist] TPrincipal bounds check and format by ferdymercury · Pull Request #15110 · root-project/root · GitHub

hi @ferhue , could you clarify the statement that TVectorT is based on 32-bit integers ?!

Could it be that you refer to that the size of a vector is limited to the
max of 4-bytes = 2,147,483,647 ?

In linear algebra a maximum vector size implies also a maximum matrix size
which is now ~ 4.6e16 elements …

-Eddy

Yes, exactly. Size of vector is maximum Int_t::max()

See constructor for example ROOT: TVectorT< Element > Class Template Reference

Indeed TMatrixT max nr of rows of square matrix is sqrt of that limit.

If you use a higher value, then Error Overflow is printed. See line 493 of TMatrixT.cxx

The symptoms are clear but what course of action should be taken ? Any class using Int_t as the size of the object will at some point overflow. A linear algebra class has in addition the issue that matrices will overflow before vectors do.

In this specific case, the issue could be resolved by not storing the data in TPrincipal in a vector. When analyzing so much data in this class, my first worry (long before reaching the Int_t-limit), would be the summation rounding errors (Kahan summation).

Yes, it’s worrrysome.

I’ve worked a bit on better warnings when this kind of silent overflows happens, see TBuffer* classes should abort in case the 1GB limit is being hit · Issue #14770 · root-project/root · GitHub

To me the only solution is a big overhaul find_replace Int to Long and copy the result to the new ROOT7 namespace or something like that. Some tests would be advisable, too.
The problem appears also often for IO:

Yes, I agree, rewriting the class to not rely on TVector could also work, as well as using Kahan compensation.

Agreed. But if the consensus is that the solution is to use 8-Byte int’s, maybe just cast Int_t to long ?

I am confused, what do you mean here?

I have prepared a PR were AddRow raises an error when the maximum size is reached, and gives a more senseful error message: [hist] TPrincipal bounds check and format by ferdymercury · Pull Request #15110 · root-project/root · GitHub

Other than that, I do not see what casting 4-byte int to 8-byte int could help here.

I meant to change in

core/foundation/inc/RtypesCore.h

 44 #else
 45 typedef int            Int_t;       //Signed integer 4 bytes (int)
 46 typedef unsigned int   UInt_t;      //Unsigned integer 4 bytes (unsigned int)
 47 #endif

to

 44 #else
 45 typedef long            Int_t;       //Signed integer 4 bytes (int)
 46 typedef unsigned long   UInt_t;      //Unsigned integer 4 bytes (unsigned int)
 47 #endif

This would however break backward compatibility of stored TFiles, TTrees, …

Probably my suggestion will break something but the streaming issue could be taken care of in

TObject::Streamer

with code like

void xxxxx::Streamer(TBuffer &R__b)
{
   if (R__b.IsReading()) {
      UInt_t R__s, R__c;
      Version_t R__v = R__b.ReadVersion(&R__s, &R__c);
      if (R__v > .....) {

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.