Crash during I/O - can't figure it out

I’ve got a bug and I can’t figure out what it is. I’ve got an object graph that I write out to a file and then read back. On the readback I get a crash in the destructor of one of my objects. I don’t understand why it is occuring in the destructor during read back… :slight_smile:

I can open the file in root (after building the required objects) and when I try to read it back in root I get a crash there too. Attached is gDebug=5 output of what happened.

Here is the crash:

[code] FlowBase_cpp.dll!ROOT::delete_FlowBase(void * p) Line 140 + 0x19 bytes C++

libCore.dll!TClass::Destructor(void * obj, bool dtorOnly) Line 3948 + 0x11 bytes C++
libRIO.dll!TBufferFile::ReadFastArray(void * * start, const TClass * cl, int n, bool isPreAlloc, TMemberStreamer * streamer, const TClass * onFileClass) Line 1524 C++
libRIO.dll!TStreamerInfo::ReadBuffer(TBuffer & b, const TVirtualCollectionProxy & arr, int first, int narr, int eoffset, int arrayMode) Line 1001 + 0x47 bytes C++
libRIO.dll!TStreamerInfo::ReadBufferSTL(TBuffer & b, TVirtualCollectionProxy * cont, int nc, int first, int eoffset) Line 1813 + 0x1e bytes C++
libRIO.dll!TStreamerInfo::ReadBuffer<char * *>(TBuffer & b, char * * const & arr, int first, int narr, int eoffset, int arrayMode) Line 1205 C++
libRIO.dll!TBufferFile::ReadClassBuffer(const TClass * cl, void * pointer, const TClass * onFileClass) Line 3464 C++
FlowSequential_cpp.dll!FlowSequential::Streamer(TBuffer & R__b) Line 176 + 0x1e bytes C++
libRIO.dll!TKey::ReadObj() Line 717 + 0x19 bytes C++
ROOTFileIO_cpp.dll!ReadWriteInputList(TList * list) Line 291 + 0x12 bytes C++
ROOTFileIO_cpp.dll!ROOTTFile::RunAJob(const std::basic_string<char,std::char_traits,std::allocator > & inputDS, const std::basic_string<char,std::char_traits,std::allocator > & outputDS, FlowBase * flow) Line 236 + 0x9 bytes C++
ROOTFileIO_cpp.dll!ROOTFileIOBase::RunAllJobs(FlowBase * flow) Line 78 + 0x48 bytes C++
runDriver_cpp.dll!runDriver(bool compileOnly, char * jobFileXMLName, char * runOnly) Line 127 + 0xc bytes C++
runDriver_cpp.dll!G__runDriver_cpp_ACLiC_dict__0_927(G__value * result7, const char * funcname, G__param * libp, int hash) Line 68 + 0x5e bytes C++
libCint.dll!Cint::G__ExceptionWrapper(int (G__value *, const char *, G__param , int) funcp, G__value * result7, char * funcname, G__param * libp, int hash) Line 393 + 0x15 bytes C++
libCint.dll!G__execute_call(G__value * result7, G__param * libp, G__ifunc_table_internal * ifunc, int ifn) Line 2390 + 0x19 bytes C++
libCint.dll!G__call_cppfunc(G__value * result7, G__param * libp, G__ifunc_table_internal * ifunc, int ifn) Line 2594 + 0x15 bytes C++
libCint.dll!G__interpret_func(G__value * result7, const char * funcname, G__param * libp, int hash, G__ifunc_table_internal * p_ifunc, int funcmatch, int memfunc_flag) Line 5254 + 0x15 bytes C++
libCint.dll!G__getfunction(const char * item, int * known3, int memfunc_flag) Line 2631 + 0x3a bytes C++
libCint.dll!G__getitem(const char * item) Line 1914 + 0x16 bytes C++
libCint.dll!G__getexpr(const char * expression) Line 1484 + 0x36 bytes C++
libCint.dll!G__exec_function() Line 644 + 0x12 bytes C++
libCint.dll!G__exec_statement() Line 7122 + 0x1c bytes C++
libCint.dll!G__interpret_func(G__value * result7, const char * funcname, G__param * libp, int hash, G__ifunc_table_internal * p_ifunc, int funcmatch, int memfunc_flag) Line 6127 + 0x13 bytes C++
libCint.dll!G__getfunction(const char * item, int * known3, int memfunc_flag) Line 2631 + 0x3a bytes C++
libCint.dll!G__getitem(const char * item) Line 1914 + 0x16 bytes C++
libCint.dll!G__getexpr(const char * expression) Line 1484 + 0x36 bytes C++
libCint.dll!G__calc_internal(const char * exprwithspace) Line 1068 + 0x10 bytes C++
libCint.dll!G__process_cmd() Line 2311 + 0x15 bytes C++
libCore.dll!TCint::ProcessLine(const char * line, TInterpreter::EErrorCode * error) Line 510 + 0x1f bytes C++
libCore.dll!TCint::ProcessLineSynch(const char * line, TInterpreter::EErrorCode * error) Line 589 + 0x1a bytes C++
libCore.dll!TApplication::ExecuteFile(const char * file, int * error, bool keep) Line 1007 + 0x34 bytes C++
libCore.dll!TApplication::ProcessFile(const char * file, int * error, bool keep) Line 883 + 0x12 bytes C++
libCore.dll!TApplication::ProcessLine(const char * line, bool sync, int * err) Line 856 + 0x2e bytes C++
libRint.dll!TRint::Run(bool retrn) Line 401 + 0x1f bytes C++
root.exe!main(int argc, char * * argv) Line 29 + 0x14 bytes C++
root.exe!__tmainCRTStartup() Line 555 + 0x19 bytes C
root.exe!mainCRTStartup() Line 371 C
kernel32.dll!76093677()
[Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll]
ntdll.dll!77319d42()
ntdll.dll!77319d15()
[/code]

I am pretty sure that the problem is in the definition of this containing object:

[code]#ifndef FlowSequential
#define FlowSequential
///
/// We run a sequential flow, having several guys under us
///

#include “FlowBase.h”

#include
#include

class FlowSequential :
public FlowBase
{
public:
FlowSequential(void);
~FlowSequential(void);

void InitialConfigure(TXMLNode *configNode);

/// We send this to everyone we know!
void Process(BasicPlotMaker *vars);

/// We have to create sub-folders, etc., for bookings
virtual void DoBookResults(TList *outputList, const std::string &storageDirPath);

struct FlowInfo
{
	FlowBase *_flow;
	std::string _name;
};

private:
std::vector _flows;

ClassDef(FlowSequential, 1);

};

#endif
[/code]

I think it has to do with _flows - but when the object is created it shoudl be empty.

Any thing obvious I’ve done wrong here? Also attached are the objects that are written out (though missing a few include files).
InputListFlat.root (4.58 KB)
iocrash.txt (19.8 KB)

Thanks to Rene’s post, I have valgrind working now - unfortunately, it doesn’t seem to have caught the problem (well this one, it did find another). when I run with it (and standard root supressions turned on) the first error it catches is just before the crash:

==5493== Conditional jump or move depends on uninitialised value(s) ==5493== at 0x97F0795: TBufferFile::ReadFastArray(void**, TClass const*, int, bool, TMemberStreamer*, TClass const*) (TBufferFile.cxx:1504) ==5493== by 0x9905652: int TStreamerInfo::ReadBuffer<TVirtualCollectionProxy>(TBuffer&, TVirtualCollectionProxy const&, int, int, int, int) (TStreamerInfoReadBuffer.cxx:1001) ==5493== by 0x984EE07: TStreamerInfo::ReadBufferSTL(TBuffer&, TVirtualCollectionProxy*, int, int, int) (TStreamerInfoReadBuffer.cxx:1813) ==5493== by 0x9917E83: int TStreamerInfo::ReadBuffer<char**>(TBuffer&, char** const&, int, int, int, int) (TStreamerInfoReadBuffer.cxx:1203) ==5493== by 0x97ECFE5: TBufferFile::ReadClassBuffer(TClass const*, void*, TClass const*) (TBufferFile.cxx:3463) ==5493== by 0xEDF3BF0: FlowSequential::Streamer(TBuffer&) (FlowSequential_cpp_ACLiC_dict.cxx:176) ==5493== by 0x982B9C6: TKey::ReadObj() (TKey.cxx:717) ==5493== by 0xFC8B6B4: ReadWriteInputList(TList*) (ROOTFileIO.cpp:287) ==5493== by 0xFC8BAC7: ROOTTFile::RunAJob(std::string const&, std::string const&, FlowBase*) (ROOTFileIO.cpp:236) ==5493== by 0xFC86B61: ROOTFileIOBase::RunAllJobs(FlowBase*) (ROOTFileIO.cpp:78) ==5493== by 0xFE990C7: runDriver(bool, char*, char*) (runDriver.cpp:127) ==5493== by 0xFE9926D: G__runDriver_cpp_ACLiC_dict__0_2056(G__value*, char const*, G__param*, int) (runDriver_cpp_ACLiC_dict.cxx:72) ==5493== by 0x578FAAE: Cint::G__ExceptionWrapper(int (*)(G__value*, char const*, G__param*, int), G__value*, char*, G__param*, int) (Api.cxx:393) ==5493== by 0x588BEBE: G__execute_call (newlink.cxx:2390) ==5493== by 0x588CC7F: G__call_cppfunc (newlink.cxx:2594) ==5493== by 0x584323E: G__interpret_func (ifunc.cxx:5254) ==5493== by 0x5830863: G__getfunction (func.cxx:2631) ==5493== by 0x57F54A6: G__getitem (expr.cxx:1914) ==5493== by 0x580BBC4: G__getexpr (expr.cxx:1484) ==5493== by 0x58A13C7: G__exec_function(G__FastAllocString&, int*, int*, int*, G__value*) (parse.cxx:644)

And this is exactly the crash that is about to happen. :slight_smile: So, there is something else going wrong here. Since I don’t get any errors on the write, this looks like a bit more subtle bug. :slight_smile: I’ll try to make a complete tar ball of it so it can easily be re-run, and attach it.

Ok, here is a tarball to repro the problem. After untaring, cd into the untar’d directory and type:

The script (see ROOTFileIO.cpp, last method) will write an object graf out to a file and then read it back. During the read back you get a crash. The output where the crash happens looks something like this:

[code]Info in TUnixSystem::ACLiC: creating shared library /home/gwatts/testarea/15.6.12.1/testNtup/tagtest/PlotFramework/runDriver_cpp.so
Flow base was created!!
FlowSequential was created
We see this many flows; 3
Flow base was created!!
Creating a GlobalCacheReset
Flow base was created!!
Creating a Make jets collectin
Done with the make jets collection creation
Flow base was created!!
Creating a jet kinematic plots
Done with setup
Flow base was created!!
FlowSequential was created
We see this many flows; 3
Flow base was created!!
Creating a GlobalCacheReset
Flow base was created!!
Creating a Make jets collectin
Done with the make jets collection creation
Flow base was created!!
Creating a jet kinematic plots
Done with setup
Now doing the readback!
Flow base was created!!
FlowSequential was created
Flow base was created!!
Creating a GlobalCacheReset

*** Break *** segmentation violation

===========================================================
There was a crash (kSigSegmentationViolation).
This is the entire stack trace of all threads:

#0 0x00000037ba699d75 in waitpid () from /lib64/libc.so.6
#1 0x00000037ba63c331 in do_system () from /lib64/libc.so.6
#2 0x00002b725ed6b789 in TUnixSystem::Exec (this=0xb7ad010,
shellcmd=0xc2aa1e8 “/data2/gwatts/root/etc/gdb-backtrace.sh 8304 1>&2”)
at core/unix/src/TUnixSystem.cxx:1982
#3 0x00002b725ed6aa12 in TUnixSystem::StackTrace (this=0xb7ad010)
at core/unix/src/TUnixSystem.cxx:2192
[/code]

There is also a valgrindTest.sh file in there… Now that I’ve built the debug versions on both Linux and Windows, the crash is identical on both platforms. The bug is, somehow, in the way that FlowSequential writes out its vector member, I’m positive, but I don’t see what is wrong with it (if I change it so that vector is empty, everything works. It is only when I add something into it that it fails).

Sorry I’m stuck on this - but I’ve reached the limit of my ROOT I/O know-how. :frowning: Thanks in advance.

Cheers, Gordon.

And I should have mentinoed - I’m using the debug build of root from the head of svn as of late last night. I’ve seen this crash on 5.26 as well, but it has been a while since I’ve tested that.

Hi,

am I not seeing it or did you forget to upload the tarball?

Cheers, Axel.

Weird. I was sure I attached it. Ok, trying again. Oh. Max size is 2 MB… I missed that error previously. Ok, need to see if I can remove the 4MB input file and put it somewhere else, or make the code crash anyway…

Ok, here it is. Same instructions as before…

Sorry about that!!
tagtest.tar.gz (132 KB)

Hi Gordon,

FlowInfo needs a constructor initializing _flow to 0. It gets deleted because it’s != NULL, and ROOT I/O will replace it, and otherwise there’ll be a mem leak.

Cheers, Axel.

Argh! You are right! I knew this was a dumb error on my part, thanks.

How did you track this down? I guess this is what valgrind was pointing out to me when it said something about a decsion being made on an uninitalized variable.

Hi,

Philippe saw it in the backtrace. It’s also a known user problem: you’re not the only one forgetting to initialize pointers before handing them out to other code :slight_smile:

Cheers, Axel.

So… CINT and rootcint means that ROOT knows about all pointers in an object. And CINT has the ability to do a new-in-place (if I remember correctly). One could thus do a new in place and fill the memory with a pattern of some sort (0xecececec or something like that). Then if you looked at a pointer that was not marked transient you could print out a huge error message.

Clearly this couldn’t run normally as the memory fill would slow things down, but after a crash the user could turn it on temporarily with “gDetectDumbMistakes=1” or something like that. I would be there are other errors that were detectable (or warnings) - too expensive during normal running, but peraps not afterwards… Or a tool that looks at all objects in a shared library that gets loaded… etc.

Gordon,

No need to reinvent the wheel. Use the well known valgrind for this type of problem.

Rene