Pyroot / (cling?) problem iterating over empty vector

Hello,

I get crashes when iterating with pyroot over empty std::vectors
in some cases, where T is a class of my own. The code below reproduces
(hopefully) and illustrates the problem.

For what concerns pyroot, I have traced this back to a call to
vector::data() in Pythonize.cxx. I suggest a fix/workaround
below by simply checking size()>0 before calling data() to
prevent the crash.

I am not sure if calling data() on an empty vector is supposed to be
allowed. Maybe it is cling/gcc or some combination of them that is
not compliant here? (it is actually hard to undertand for me exactly
which things come from cling and which come from gcc via the .so
file.).

Cheers,
Aart.

python script reproducing the problem

import ROOT

open("aa.hh", 'w').write("""

#include "TObject.h"
#include <vector>
#include <iostream>
using std::cout;
using std::endl;

struct AA : public TObject
{
  int x;
  int y;
  ClassDef( AA, 1 );
};

inline std::vector<AA> getv() 
{
  std::vector<AA> r;
  // cout << r.data() << endl; // crashes!

  for( auto x : r ) { cout << &x << endl; } // fine (c++ iteration does not call data(), I guess)
  return r;
}

// BB is identical to AA, but there is no mention of vector<BB> anywhere
struct BB : public TObject
{
  int x;
  int y;
  ClassDef( BB, 1 );
};
""" )

open("aa.cc", 'w').write( """
#include "aa.hh"
""" )
    
ROOT.gROOT.ProcessLine(".L aa.cc+");

vaa = ROOT.vector( ROOT.AA )() # difference between AA and BB: there is no mention...
vbb = ROOT.vector( ROOT.BB )() # ...of vector<BB> on the c++ side.

for x in vbb : print x # fine
print vbb.data()       # also fine (null ptr)

for x in vaa : print x # crashes.
print vaa.data()       # also crashes (as does calling data on c++ side)

# The following prevents the crash
# (presumably some ptr is not initlized until the vector contains something)
v.resize(1)
v.resize(0)
print v.data()

# other observations
# - commenting out ClassDef does not matter
# - I use aclic in this example, but also observed with rootcling/g++ compiled libraries.

script output on my machine

[heijboer@cca009 tests]$ python -i bug.py
Info in <TUnixSystem::ACLiC>: creating shared library /sps/km3net/users/heijboer/aanet/tests/./aa_cc.so
<ROOT.BB object at 0x(nil)>

Thread 2 (Thread 0x7f350f7d5700 (LWP 61742)):
#0  0x00007f352914fafb in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
#1  0x00007f352914fb8f in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x00007f352914fc2b in sem_wait

GLIBC_2.2.5 () from /lib64/libpthread.so.0
#3  0x00007f352946f735 in PyThread_acquire_lock () from /lib64/libpython2.7.so.1.0
#4  0x00007f352943b296 in PyEval_RestoreThread () from /lib64/libpython2.7.so.1.0
#5  0x00007f350f7d8016 in time_sleep () from /usr/lib64/python2.7/lib-dynload/timemodule.so
#6  0x00007f3529442cf0 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#7  0x00007f352944503d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#8  0x00007f35293cea6d in function_call () from /lib64/libpython2.7.so.1.0
#9  0x00007f35293a9a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#10 0x00007f352943d6fd in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#11 0x00007f35294426bd in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#12 0x00007f35294426bd in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#13 0x00007f352944503d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#14 0x00007f35293ce978 in function_call () from /lib64/libpython2.7.so.1.0
#15 0x00007f35293a9a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#16 0x00007f35293b8a55 in instancemethod_call () from /lib64/libpython2.7.so.1.0
#17 0x00007f35293a9a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#18 0x00007f352943b8f7 in PyEval_CallObjectWithKeywords () from /lib64/libpython2.7.so.1.0
#19 0x00007f3529473822 in t_bootstrap () from /lib64/libpython2.7.so.1.0
#20 0x00007f3529149e25 in start_thread () from /lib64/libpthread.so.0
#21 0x00007f352876abad in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f3529911740 (LWP 61546)):
#0  0x00007f35287311c9 in waitpid () from /lib64/libc.so.6
#1  0x00007f35286aee52 in do_system () from /lib64/libc.so.6
#2  0x00007f35286af201 in system () from /lib64/libc.so.6
#3  0x00007f352631a0ef in TUnixSystem::StackTrace (this=0x1f24d80) at /pbs/throng/ccin2p3/support/gadrat/centos-7-x86_64/root/root_v6-10-02/core/unix/src/TUnixSystem.cxx:2412
#4  0x00007f3520d7f005 in cling::MultiplexInterpreterCallbacks::PrintStackTrace() () from /pbs/software/centos-7-x86_64/root/6.10.02/lib/libCling.so
#5  0x00007f3520d7e5e8 in cling_runtime_internal_throwIfInvalidPointer () from /pbs/software/centos-7-x86_64/root/6.10.02/lib/libCling.so
#6  0x00007f352992c07d in ?? ()
#7  0x00007ffdbb46cf50 in ?? ()
#8  0x00007f352992c128 in ?? ()
#9  0x00007ffdbb46d540 in ?? ()
#10 0x0000000000000000 in ?? ()
terminate called after throwing an instance of 'cling::InvalidDerefException'
  what():  Trying to dereference null pointer or trying to call routine taking non-null arguments
Aborted
[heijboer@cca009 tests]$ 

suggested workaround in bindings/pyroot/src/Pythonize.cxx

   static PyObject* vector_iter( PyObject* v ) {

      vectoriterobject* vi = PyObject_GC_New( vectoriterobject, &VectorIter_Type );
      if ( ! vi ) return NULL;

      Py_INCREF( v );
      vi->vi_vector = v;

      PyObject* pyvalue_type = PyObject_GetAttrString( (PyObject*)Py_TYPE(v), "value_type" );
      PyObject* pyvalue_size = PyObject_GetAttrString( (PyObject*)Py_TYPE(v), "value_size" );
      
      PyObject* pysize = CallPyObjMethod( v, "size" );        // added
      long size = PyLong_AsLong( pysize );                    // added

      if ( size > 0 && pyvalue_type && pyvalue_size ) {       // added size> 0 &&
	
	PyObject* pydata = CallPyObjMethod( v, "data" );
	 

_ROOT Version: 6.10.02
_Platform: x86 / centos7
_Compiler: gcc version 4.8.5


Thank you @aart for reporting, I’m having a look,

Cheers,
Enric

Do you need those structs to inherit from TObject?

The code you pasted works for me if you remove that inheritance (still not sure why).

Enric

Dear Enric,

Thanks for looking into this.
For me, the crash still happens also when when not inheriting from
TObject. (and yes: some of my real-life classes should inherit from
TObject).

cheer,
aart

Hi @aart

Let’s try to narrow this down. This could be related to the ROOT version you are using and fixed in a newer version. If I run with current ROOT master the following code:

import ROOT

ROOT.gInterpreter.Declare("""
#include "TObject.h"
#include <vector>
#include <iostream>
using std::cout;
using std::endl;

struct AA  : public TObject
{
  int x;
  int y;
  ClassDef( AA, 1 );
};

inline std::vector<AA> getv()
{
  std::vector<AA> r;
  // cout << r.data() << endl; // crashes!

  for( auto x : r ) { cout << &x << endl; } // fine (c++ iteration does not call data(), I guess)
  return r;
}

// BB is identical to AA, but there is no mention of vector<BB> anywhere
struct BB : public TObject
{
  int x;
  int y;
  ClassDef( BB, 1 );
};
""")

vaa = ROOT.vector( ROOT.AA )()
vbb = ROOT.vector( ROOT.BB )()

for x in vbb : print x
for x in vaa : print x

Can you confirm that the code above fails in your setup? With ROOT master it works, even when inheriting from TObject.

Enric

Hi Enric,

Also for me, it does not crash when using with Declare, but it does when
using aclic or even just #including the file.
So, I somehow doubt it is related to the version (6.10 – I also tried
with 6.12 at some point).

Cheers,
aart

#s = the text of the c++ code
ROOT.gInterpreter.Declare( s )              # fine
#ROOT.gROOT.ProcessLine('#include "s.hh"'); # crash
#ROOT.gROOT.ProcessLine(".L s.cc+");        # crash 

Hi @aart

Ok here I am with two more versions. I tried to recreate your crash when including the file and using aclic. I could not reproduce any of them.

  • With include, this is what I run:
import ROOT

open("aa.hh", 'w').write("""
#include "TObject.h"
#include <vector>
#include <iostream>
using std::cout;
using std::endl;

struct AA  : public TObject
{
  int x;
  int y;
  ClassDef( AA, 1 );
};

inline std::vector<AA> getv()
{
  std::vector<AA> r;
  // cout << r.data() << endl; // crashes!

  for( auto x : r ) { cout << &x << endl; } // fine (c++ iteration does not call data(), I guess)
  return r;
}

// BB is identical to AA, but there is no mention of vector<BB> anywhere
struct BB : public TObject
{
  int x;
  int y;
  ClassDef( BB, 1 );
};
""")

ROOT.gROOT.ProcessLine('#include "aa.hh"')

vaa = ROOT.vector( ROOT.AA )()
vbb = ROOT.vector( ROOT.BB )()

for x in vbb : print x
for x in vaa : print x
  • With ACLIC, this is what I run:
import ROOT

open("aa.hh", 'w').write("""
#include "TObject.h"
#include <vector>
#include <iostream>
using std::cout;
using std::endl;

struct AA  : public TObject
{
  int x;
  int y;
  ClassDef( AA, 1 );
};

inline std::vector<AA> getv()
{
  std::vector<AA> r;
  // cout << r.data() << endl; // crashes!

  for( auto x : r ) { cout << &x << endl; } // fine (c++ iteration does not call data(), I guess)
  return r;
}

// BB is identical to AA, but there is no mention of vector<BB> anywhere
struct BB : public TObject
{
  int x;
  int y;
  ClassDef( BB, 1 );
};
""")

open("aa.cc", 'w').write("""
#include "aa.hh"
""")

ROOT.gROOT.ProcessLine(".L aa.cc+");

vaa = ROOT.vector( ROOT.AA )()
vbb = ROOT.vector( ROOT.BB )()

for x in vbb : print x
for x in vaa : print x

Can you confirm that both of these (exact) two codes crash in your case?

Cheers,
Enric

Hi Enric,

confirmed with my 6.10.2/gcc 4.8 installation (both crash). I also tried earlier
on 6.12.06 with the same result. I have another machine with 6.14.02/gcc 7.3
and there I have no crashes.

Have you reproduced the problem at all? If not, then of course I cannot exclude
it is somehow related to my setup. On the other hand: if this is a matter of dereferencing
an uninitialized pointer (which it looks like to me), the pointer may be null or not
depending on basically random circumstances.

cheers,
aart

Hi @aart

Indeed I cannot reproduce your cling::InvalidDerefException when iterating over those empty vectors, I have tried centos7, slc6 and macos (6.14 and master).

Actually I believe the code of the pythonization of the vector iteration in Pythonize.cxx should cover the case in which v.data() returns a nullptr:

         PyObject* pydata = CallPyObjMethod( v, "data" );
         if ( !pydata || Utility::GetBuffer( pydata, '*', 1, vi->vi_data, kFALSE ) == 0 )
            vi->vi_data = nullptr;

So I am not sure from where the crash comes from. Would it be an option for you to use a newer version of ROOT, given that in that version the crash seems to not happen anymore?

Cheers,
Enric

Hi Enric,

To be clear: the problem is not that vector::data() returns a null pointer, the
problem is that, inside the call, a null pointer is apparently dereferenced.
(in cases vector::data() does not crash, it indeed typically returns null pointer,
which is then indeed fine for Pythonize.cxx).

I doubt it this related to root version (as I said, I reproduced with 6.12), but maybe
it is related to my environment in some other way (my best guess is some strange
inconsistency between cling and gcc, but I really don’t know enough to make a lot
of progress to debug that – i.e. I do not even really understand whether it is gcc
or cling/clang that is compiling my vector(BB)).

Anyway: I’ll try to get some colleagues to reproduce this. If that sheds some light, I’ll
post it. If not, them apparently it’s really something with my setup and it does
not make much sense for you to try to debug it if you cannot reproduce. I’ll let
you know thanks so far.

Cheers,
aart

Hi @aart

Ok, please let me know if you have any news about that.

If you use ACLIC, the compiler that was used to build ROOT will compile the definition of your structs and getv function and cling will dynamically instantiate the vectors.

One thing to investigate could be if this happens also in pure C++ code, without the intervention of PyROOT.

Cheers,
Enric

It’s not a pyroot problem (except in the sence that in pyroot, iteration
calls the data() method and c++ iteration doesn’t). In fact, i even get it
when just defining the struct in cling.

[heijboer@cca010 ~]$ root -l
root [0] struct A { double x,y,z;}
root [1] vector<A> x;
root [2] for ( auto y : x ) cout << &y; // fine in c++
root [3] x.data()                       // not fine
#0  0x00007f2e5f81217c in waitpid () from /lib64/libc.so.6
#1  0x00007f2e5f78fe52 in do_system () from /lib64/libc.so.6
#2  0x00007f2e608240ef in TUnixSystem::StackTrace (this=0xadd8e0) at /pbs/throng/ccin2p3/support/gadrat/centos-7-x86_64/root/root_v6-10-02/core/unix/src/TUnixSystem.cxx:2412
#3  0x00007f2e5c267005 in cling::MultiplexInterpreterCallbacks::PrintStackTrace() () from /pbs/software/centos-7-x86_64/root/6.10.02/lib/libCling.so
#4  0x00007f2e5c2665e8 in cling_runtime_internal_throwIfInvalidPointer () from /pbs/software/centos-7-x86_64/root/6.10.02/lib/libCling.so
#5  0x00007f2e60e7918d in ?? ()
#6  0x00007ffdd2c269b0 in ?? ()
#7  0x00007f2e60e770c8 in ?? ()
#8  0x00007ffdd2c26a68 in ?? ()
#9  0x0000000000000000 in ?? ()
terminate called after throwing an instance of 'cling::InvalidDerefException'
  what():  Trying to dereference null pointer or trying to call routine taking non-null arguments

#gcc 4.8.5
#also tried with root 6.13.02 on same system

Enric
[/quote]

Hi again Enric,

Can you tell me which gcc you have tried? Did you try 4.8.5?

(what I really mean is which version of the libstd++ headers were used by cling
when jitting vector, but I assume those are always the same as the
ones gcc uses…)

thanks,
aart

Aart.

Try the latest ROOT release.

Ok - downloading 6.14.06 binary indeed no issue – sorry for being stubborn.

Can you offer any insight in what was the problem with earlier
versions / what changed to make this work?

Just curious since I though I had understood it: in stl_vector.h the
call to data() was basically doing {return & * (_M_start);}. i.e. referencing
and immediately taking this address of a nullptr (which cling does not allow):

root [0] gROOT->GetVersion()
(const char *) "6.14/06"
root [1] int* a=0;
root [2] int* b= &*a; // crashes ( as it probably should, but gcc and clang accept this)
<stack trace>
Error in <HandleInterpreterException>: Trying to dereference null pointer or trying to call routine taking non-null arguments.

Hi @aart

Perhaps @Axel knows better what changed in 6.14/06 to solve your issue?

*nullptr is undefined behavior, and cling tells you “don’t do that”.

6.14/06 doesn’t check STL anymore.

Thanks @axel.

Everything make sense then. Summary, in case it helps somebody out there:

The header file for std::vector (stl_vector.h) in gcc 4.8.5 defines a vector<T>::data() function
that dereferences nullptr in case of an empty vector. This data() method is called in PyROOT when
iterating over the vector. This caused an exception in cling for versions earlier than 6.14/06 on
systems where the headers come from gcc 4.8.

It’s not actually a crash, btw: cling just aborts execution (exception) and prints stacktrace and expression to blame. Everything is under control :wink:

Cheers, Axel

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.