ROOT core dump while filling a tree

I have a simulation which is a filling a number of simple trees with two branches, one being the simulation time and the other being the attribute logged.

After around 8.3 million samples or so, I get an exception when the call to fill the Tree is made. I’m not sure if it is a memory issue or something else. I’m running on Redhat Linux Enterprise 4 with 16GB of memory.

Some additional information. I’m guessing this occurs due to the first call to AutoSave. The particular tree being used has 12 bytes per entry. 8 for the time value which is saved as a double (we’re using picosecond resolution) and 4 bytes for the integer value of the attribute. 8.3+ million samples is ~100Mbytes which is the default setting of AutoSave.

How is AutoSave different from a call to Write? Is there something different in the dictionary generation for AutoSave as opposed to Write?

Any help is appreciated.

 *** Break *** segmentation violation
Using host libthread_db library "/lib/tls/libthread_db.so.1".
Attaching to program: /proc/17985/exe, process 17985
[Thread debugging using libthread_db enabled]
[New Thread -156477760 (LWP 17985)]
0xffffe405 in ?? ()
#1  0x00000001 in ?? ()
#2  0x0031cff4 in ?? () from /lib/tls/libc.so.6
#3  0x0022b2e9 in do_system () from /lib/tls/libc.so.6
#4  0xf74824bb in TUnixSystem::Exec ()
   from /nfs/links/sandbox/tools/root/lib/libCore.so
#5  0xf7488153 in TUnixSystem::StackTrace ()
   from /nfs/links/sandbox/tools/root/lib/libCore.so
#6  0xf7484b4e in TUnixSystem::DispatchSignals ()
   from /nfs/links/sandbox/tools/root/lib/libCore.so
#7  0xf7484bdc in SigHandler ()
   from /nfs/links/sandbox/tools/root/lib/libCore.so
#8  0xf7483e59 in sighandler ()
   from /nfs/links/sandbox/tools/root/lib/libCore.so
#9  0xffffe600 in ?? ()
#10 0x0000000b in ?? ()
#11 0xf7a731cf in G__search_tagname ()
   from /nfs/links/sandbox/tools/root/lib/libCint.so
#12 0xf79ff553 in G__parse_parameter_link ()
   from /nfs/links/sandbox/tools/root/lib/libCint.so
#13 0xf7a05ca7 in G__memfunc_setup_imp ()
   from /nfs/links/sandbox/tools/root/lib/libCint.so
#14 0xf7a065ea in G__memfunc_setup ()
   from /nfs/links/sandbox/tools/root/lib/libCint.so
#15 0xf708d37e in G__setup_memfuncTTree ()
   from /nfs/links/sandbox/tools/root/lib/libTree.so
#16 0xf7a0598b in G__incsetup_memfunc ()
   from /nfs/links/sandbox/tools/root/lib/libCint.so
#17 0xf79ce8ad in G__get_methodhandle_noerror ()
   from /nfs/links/sandbox/tools/root/lib/libCint.so
#18 0xf79ceda5 in G__get_methodhandle ()
   from /nfs/links/sandbox/tools/root/lib/libCint.so
#19 0xf798c2f5 in Cint::G__ClassInfo::GetMethod ()
   from /nfs/links/sandbox/tools/root/lib/libCint.so
#20 0xf79892e9 in Cint::G__CallFunc::SetFuncProto ()
   from /nfs/links/sandbox/tools/root/lib/libCint.so
#21 0xf7477243 in TCint::CallFunc_SetFuncProto ()
   from /nfs/links/sandbox/tools/root/lib/libCore.so
#22 0xf744383e in TClass::CalculateStreamerOffset ()
   from /nfs/links/sandbox/tools/root/lib/libCore.so
#23 0xf74439f0 in TClass::CallShowMembers ()
   from /nfs/links/sandbox/tools/root/lib/libCore.so
#24 0xf744759c in TClass::BuildRealData ()
   from /nfs/links/sandbox/tools/root/lib/libCore.so
#25 0xf714562a in TBufferFile::WriteClassBuffer ()
   from /nfs/links/sandbox/tools/root/lib/libRIO.so
#26 0xf7020079 in TTree::Streamer ()
   from /nfs/links/sandbox/tools/root/lib/libTree.so
#27 0xf7173044 in TKey::TKey ()
   from /nfs/links/sandbox/tools/root/lib/libRIO.so
#28 0xf715a8ed in TFile::CreateKey ()
   from /nfs/links/sandbox/tools/root/lib/libRIO.so
#29 0xf714f71d in TDirectoryFile::WriteTObject ()
   from /nfs/links/sandbox/tools/root/lib/libRIO.so
#30 0xf7015314 in TTree::AutoSave ()
   from /nfs/links/sandbox/tools/root/lib/libTree.so
#31 0xf701a39c in TTree::Fill ()
   from /nfs/links/sandbox/tools/root/lib/libTree.so
#32 0x0823209e in rootAttribute<unsigned int>::written (this=0x8fb4484)
    at root_attribute.h:277
#33 0x08231dd3 in rootAttribute<unsigned int>::setValue (this=0x8fb4484, 
    newval=8) at root_attribute.h:209
#34 0x08234518 in rootAttribute<unsigned int>::operator++ (this=0x8fb4484)
    at root_attribute.h:189
#35 0x0822f6df in rootAttribute<unsigned int>::operator++ (this=0x8fb4484)
    at root_attribute.h:193
#36 0x082cc186 in CLancerSch::ulp_thread (this=0x8fb3d38, nID=5)
    at /home/tdoherty/projects/slm/src/lancer/lancer_sch.cpp:491
#37 0x082cbc2a in CLancerSch::ulp5_thread (this=0x8fb3d38)
    at /home/tdoherty/projects/slm/src/lancer/lancer_sch.cpp:468
#38 0x08366ee4 in sc_core::sc_thread_cor_fn ()
#39 0x0837b560 in qt_null ()

best regards,
Terry[quote][/quote]

Could you provide the shortest RUNNING setup reproducing this problem?

Rene

[quote]How is AutoSave different from a call to Write? Is there something different in the dictionary generation for AutoSave as opposed to Write? [/quote]The difference is in the way the key is handled/replaced. There is no difference in the dictionary handling.

Cheers,
Philippe.

I will attempt to create a test case. It will take some time to create a test case because I will need to separate the ROOT portion of the simulation from another class library which depends on some commercial software licenses that are tied to specific license servers.

For now I am working around the issue by increasing the AutoSave value to 500MB.

Best regards,
Terry

I have a reduced test case which reproduces the autosave segmentation fault. Unfortunately, the reduced test case relies on a freely available 3rd party library (SystemC) so you will need to build that library to reproduce the problem. But, it is relatively small and easy to build. It is probably easiest for me to email it to you.

It is likely that the issue is an interaction between the ROOT library and the SystemC library so if debugging this is outside your purview I understand. I currently have a workaround which will work for me.

My system information:

Linux 2.6.9-34.ELsmp #1 SMP Fri Feb 24 16:56:28 EST 2006 x86_64 x86_64 x86_64 GNU/Linux

I’m using ROOT 5.24/00

Best regards,
Terry
root_autosave_segf.tar.gz (510 KB)

Hi,

Can you send me the SystemC library?

Thanks,
Philippe.

Hi Terry,

I finally got a chance to look at your problem (my apologies for the delay).
I built your software on a 32bit machines (SLC4, gcc 3.4.3 with the trunk of ROOT) and could not reproduce your crash.

However running with valgrind, I found many problems like: valgrind --log-file=pkt.vallog ./pkt_switch > pkt.log 2>&1 & less pkt.vallog ..... ==5521== Invalid write of size 4 ==5521== at 0x80C6F21: sc_core::sc_event::trigger() (sc_event.cpp:264) ==5521== by 0x80DD394: sc_core::sc_simcontext::crunch(bool) (sc_simcontext.cpp:596) ==5521== by 0x80DBA4A: sc_core::sc_simcontext::simulate(sc_core::sc_time const&) (sc_simcontext.cpp:856) ==5521== by 0x80DC76F: sc_core::sc_start(sc_core::sc_time const&) (sc_simcontext.cpp:1319) ==5521== by 0x80DC7D4: sc_core::sc_start() (sc_simcontext.cpp:1325) ==5521== by 0x80B5171: sc_main (main.cpp:144) ==5521== by 0x80CA64F: sc_elab_and_sim (sc_main_main.cpp:99) ==5521== by 0x80CA4B9: main (sc_main.cpp:52) ==5521== Address 0x68CE91C is 12 bytes inside a block of size 64 alloc'd ==5521== at 0x4004405: malloc (vg_replace_malloc.c:149) ==5521== by 0x4302B4A: operator new(unsigned) (new_op.cc:48) ==5521== by 0x80E6687: sc_core::sc_signal<bool>::posedge_event() const (sc_signal.h:327) ==5521== by 0x80BFEF0: sc_core::sc_event_finder_t<sc_core::sc_signal_in_if<bool> >::find_event(sc_core::sc_interface*) const (sc_event_finder.h:170) ==5521== by 0x80E9E66: sc_core::sc_port_base::complete_binding() (sc_port.cpp:523) ==5521== by 0x80EA5BB: sc_core::sc_port_registry::complete_binding() (sc_port.cpp:700) ==5521== by 0x80EA5D6: sc_core::sc_port_registry::elaboration_done() (sc_port.cpp:710) ==5521== by 0x80DB4A8: sc_core::sc_simcontext::elaborate() (sc_simcontext.cpp:658) ==5521== by 0x80DB821: sc_core::sc_simcontext::initialize(bool) (sc_simcontext.cpp:801) ==5521== by 0x80DB864: sc_core::sc_simcontext::simulate(sc_core::sc_time const&) (sc_simcontext.cpp:810) ==5521== by 0x80DC76F: sc_core::sc_start(sc_core::sc_time const&) (sc_simcontext.cpp:1319) ==5521== by 0x80B5CF7: sc_core::sc_start(double, sc_core::sc_time_unit) (sc_simcontext.h:608) ......

Cheers,
Philippe.

Hi,

Also note that some of the meta-data setup is not yet completely thread safe and we recommend that you insure that it is completed before doing parallel operation. You can enforce that it is the case by storing one of your TTree object in a TFile (possibly a dummy one) before starting the threads.

Cheers,
Philippe.

Sorry for the even later reply. I haven’t been checking.

The SystemC library does not work well with memory leak checkers. Unfortunately, the implementers of the library decided that many things which are created statically at the beginning of a simulation and are prohibited from being created once simulation begins did not need to be properly cleaned up with calls to delete, etc. and instead they chose to rely on the OS for cleaning up the memory. They have taken a lot of flak for that decision, but they have not changed it. This of course makes it difficult to check for “real” memory leaks.

I’ve been avoiding creating and writing the trees to the file at the beginning of simulation in order to avoid filling the ROOT file with thousands of empty trees since I only record data if I enable tracing for an attribute of interest. I could remove empty trees at the end of simulation.

Thank you for very much your efforts.

Best regards,
Terry