Malloc() in signal handler

We experience a deadlock with the following backtrace:

#0  0x00007ffff2567eec in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x00007ffff24e505d in _L_lock_14730 () from /lib64/libc.so.6
#2  0x00007ffff24e2163 in malloc () from /lib64/libc.so.6
#3  0x00007ffff2fb81bd in operator new(unsigned long) () from /lib64/libstdc++.so.6
#4  0x00007ffff653c7e9 in TStorage::ObjectAlloc(unsigned long) () from /usr/lib64/root/libCore.so.6.10
#5  0x00007ffff3a5ffd0 in TPosixThreadFactory::CreateThreadImp() () from /usr/lib64/root/libThread.so.6.10
#6  0x00007ffff3a5a63c in TThread::Init() () from /usr/lib64/root/libThread.so.6.10
#7  0x00007ffff3a5ab55 in TThread::SelfId() () from /usr/lib64/root/libThread.so.6.10
#8  0x00007ffff3805275 in THttpServer::ProcessRequests() () from /usr/lib64/root/libRHTTP.so.6.10
#9  0x00007ffff65612b2 in TTimer::Notify() () from /usr/lib64/root/libCore.so.6.10
#10 0x00007ffff6561201 in TTimer::CheckTimer(TTime const&) () from /usr/lib64/root/libCore.so.6.10
#11 0x00007ffff660638b in TUnixSystem::DispatchTimers(bool) () from /usr/lib64/root/libCore.so.6.10
#12 0x00007ffff6606517 in TUnixSystem::DispatchSignals(ESignals) () from /usr/lib64/root/libCore.so.6.10
#13 <signal handler called>
#14 0x00007ffff24df273 in _int_malloc () from /lib64/libc.so.6
#15 0x00007ffff24e210c in malloc () from /lib64/libc.so.6
#16 0x00007ffff2fb81bd in operator new(unsigned long) () from /lib64/libstdc++.so.6
#17 0x00007ffff76c7d75 in Ph2_HwInterface::Data::privateSet (this=0x2ac7a20, pBoard=0x2950570, pData=std::vector of length 28000, capacity 28000 = {...}, pNevents=<optimized out>, pType=D19C)
    at /afs/cern.ch/user/n/ndeelen/Ph2_ACF/Utils/Data.cc:130
#18 0x00007ffff76c884e in operator() (this=0x2aca428) at /usr/include/c++/4.8.2/functional:2471
#19 operator() (this=0x2aca420) at /usr/include/c++/4.8.2/future:1235
#20 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, void> >::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/4.8.2/functional:2057
#21 0x00007ffff76c86ee in operator() (this=<optimized out>) at /usr/include/c++/4.8.2/functional:2471
#22 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&) (this=0x29d5da8, __f=..., 
    __set=@0x7fffffffc000: false) at /usr/include/c++/4.8.2/future:471
#23 0x00007ffff2831e20 in pthread_once () from /lib64/libpthread.so.0
#24 0x00007ffff76c907a in __gthread_once (__func=<optimized out>, __once=0x29d5e14) at /usr/include/c++/4.8.2/x86_64-redhat-linux/bits/gthr-default.h:699
#25 call_once<void (std::__future_base::_State_base::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>()>&, bool&), std::__future_base::_State_base* const, std::reference_wrapper<std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>()> >, std::reference_wrapper<bool> > (__f=<optimized out>, __once=...)
    at /usr/include/c++/4.8.2/mutex:786
#26 _M_set_result (__ignore_failure=true, __res=..., this=0x29d5da8) at /usr/include/c++/4.8.2/future:358
#27 std::__future_base::_Deferred_state<std::_Bind_simple<std::_Mem_fn<void (Ph2_HwInterface::Data::*)(Ph2_HwDescription::BeBoard const*, std::vector<unsigned int, std::allocator<unsigned int> > const&, unsigned int, BoardType)> (Ph2_HwInterface::Data*, Ph2_HwDescription::BeBoard const*, std::vector<unsigned int, std::allocator<unsigned int> >, unsigned int, BoardType)>, void>::_M_run_deferred() (this=0x29d5da8)
    at /usr/include/c++/4.8.2/future:1465
#28 0x00007ffff7239c88 in wait (this=0x29d5da8) at /usr/include/c++/4.8.2/future:325
#29 _M_get_result (this=0x2ac7aa8) at /usr/include/c++/4.8.2/future:596
#30 get (this=0x2ac7aa8) at /usr/include/c++/4.8.2/future:761
#31 GetEvents (pBoard=0x2950570, this=0x2ac7a20) at /afs/cern.ch/user/n/ndeelen/Ph2_ACF/tools/../System/../HWInterface/../Utils/Data.h:150
#32 GetEvents (this=0x7fffffffc200, pBoard=0x2950570) at /afs/cern.ch/user/n/ndeelen/Ph2_ACF/tools/../System/SystemController.h:223
#33 SignalScanFit::ScanSignal (this=this@entry=0x7fffffffce70, pSignalScanLength=pSignalScanLength@entry=30) at /afs/cern.ch/user/n/ndeelen/Ph2_ACF/tools/SignalScanFit.cc:126
#34 0x0000000000411727 in main (argc=4, argv=<optimized out>) at /afs/cern.ch/user/n/ndeelen/Ph2_ACF/src/commission.cc:144

                    if (pType == BoardType::D19C)
                        fEventList.push_back ( new D19cCbc3Event ( pBoard, fNCbc, lvec ) );

#0  0x00007ffff2567eec in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x00007ffff24e505d in _L_lock_14730 () from /lib64/libc.so.6
#2  0x00007ffff24e2163 in malloc () from /lib64/libc.so.6
#3  0x00007ffff2fb81bd in operator new(unsigned long) () from /lib64/libstdc++.so.6
#4  0x00007ffff6588334 in TList::MakeIterator(bool) const () from /usr/lib64/root/libCore.so.6.10
#5  0x00007ffff38051e3 in THttpServer::ProcessRequests() () from /usr/lib64/root/libRHTTP.so.6.10
#6  0x00007ffff65612b2 in TTimer::Notify() () from /usr/lib64/root/libCore.so.6.10
#7  0x00007ffff6561201 in TTimer::CheckTimer(TTime const&) () from /usr/lib64/root/libCore.so.6.10
#8  0x00007ffff660638b in TUnixSystem::DispatchTimers(bool) () from /usr/lib64/root/libCore.so.6.10
#9  0x00007ffff6606517 in TUnixSystem::DispatchSignals(ESignals) () from /usr/lib64/root/libCore.so.6.10
#10 <signal handler called>
#11 0x00007ffff24dd61a in malloc_consolidate () from /lib64/libc.so.6
#12 0x00007ffff24de4fe in _int_free () from /lib64/libc.so.6
#13 0x00007ffff3509228 in deallocate (this=<optimized out>, __p=<optimized out>) at /usr/include/c++/4.8.2/ext/new_allocator.h:110
#14 _M_deallocate (this=<optimized out>, __n=<optimized out>, __p=<optimized out>) at /usr/include/c++/4.8.2/bits/stl_vector.h:174
#15 ~_Vector_base (this=0x2a98788, __in_chrg=<optimized out>) at /usr/include/c++/4.8.2/bits/stl_vector.h:160
#16 ~vector (this=0x2a98788, __in_chrg=<optimized out>) at /usr/include/c++/4.8.2/bits/stl_vector.h:416
#17 ~_ValVector_ (this=0x2a98730, __in_chrg=<optimized out>) at include/uhal/ValMem.hpp:127
#18 checked_delete<uhal::_ValVector_<unsigned int> > (x=0x2a98730) at /home/xtaldaq/signalScanDev/uhal_2_4_2/cactuscore/extern/boost/RPMBUILD/SOURCES/include/boost/checked_delete.hpp:34
#19 boost::detail::sp_counted_impl_p<uhal::_ValVector_<unsigned int> >::dispose (this=<optimized out>)
    at /home/xtaldaq/signalScanDev/uhal_2_4_2/cactuscore/extern/boost/RPMBUILD/SOURCES/include/boost/smart_ptr/detail/sp_counted_impl.hpp:78
#20 0x00007ffff358fee6 in release (this=0x24f9920) at /home/xtaldaq/signalScanDev/uhal_2_4_2/cactuscore/extern/boost/RPMBUILD/SOURCES/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:146
#21 ~shared_count (this=0x2a1d638, __in_chrg=<optimized out>) at /home/xtaldaq/signalScanDev/uhal_2_4_2/cactuscore/extern/boost/RPMBUILD/SOURCES/include/boost/smart_ptr/detail/shared_count.hpp:371
#22 ~shared_ptr (this=0x2a1d630, __in_chrg=<optimized out>) at /home/xtaldaq/signalScanDev/uhal_2_4_2/cactuscore/extern/boost/RPMBUILD/SOURCES/include/boost/smart_ptr/shared_ptr.hpp:328
#23 ~ValVector (this=0x2a1d630, __in_chrg=<optimized out>) at include/uhal/ValMem.hpp:289
#24 _Destroy<uhal::ValVector<unsigned int> > (__pointer=0x2a1d630) at /usr/include/c++/4.8.2/bits/stl_construct.h:93
#25 __destroy<uhal::ValVector<unsigned int>*> (__last=<optimized out>, __first=0x2a1d630) at /usr/include/c++/4.8.2/bits/stl_construct.h:103
#26 _Destroy<uhal::ValVector<unsigned int>*> (__last=<optimized out>, __first=<optimized out>) at /usr/include/c++/4.8.2/bits/stl_construct.h:126
#27 _Destroy<uhal::ValVector<unsigned int>*, uhal::ValVector<unsigned int> > (__last=0x2a1d640, __first=0x2a1d630) at /usr/include/c++/4.8.2/bits/stl_construct.h:151
#28 std::deque<uhal::ValVector<unsigned int>, std::allocator<uhal::ValVector<unsigned int> > >::_M_destroy_data_aux (this=this@entry=0x2a1c8e8, __first=..., __last=...)
    at /usr/include/c++/4.8.2/bits/deque.tcc:817
#29 0x00007ffff358ee3c in _M_destroy_data (__last=..., __first=..., this=0x2a1c8e8) at /usr/include/c++/4.8.2/bits/stl_deque.h:1853
#30 _M_erase_at_end (__pos=..., this=0x2a1c8e8) at /usr/include/c++/4.8.2/bits/stl_deque.h:1870
#31 clear (this=0x2a1c8e8) at /usr/include/c++/4.8.2/bits/stl_deque.h:1617
#32 uhal::Buffers::clear (this=0x2a1c7d0) at src/common/Buffers.cpp:154
#33 0x00007ffff3515b4f in uhal::ClientInterface::updateCurrentBuffers (this=this@entry=0x29a8a00) at src/common/ClientInterface.cpp:442
#34 0x00007ffff3515cd1 in uhal::ClientInterface::checkBufferSpace (this=0x29a8a00, aRequestedSendSize=@0x7fffffffbd10: 8, aRequestedReplySize=@0x7fffffffbd20: 8, aAvailableSendSize=@0x7fffffffbd30: 4294967294, 
    aAvailableReplySize=@0x7fffffffbd40: 43682992) at src/common/ClientInterface.cpp:338
#35 0x00007ffff350d5c8 in uhal::IPbusCore::implementRead (this=0x29a8a00, aAddr=@0x2973f9c: 1073827840, aMask=@0x2973fa0: 65535) at src/common/ProtocolIPbusCore.cpp:392
#36 0x00007ffff3515063 in uhal::ClientInterface::read (this=0x29a8a00, aAddr=@0x2973f9c: 1073827840, aMask=@0x2973fa0: 65535) at src/common/ClientInterface.cpp:577
#37 0x00007ffff3587298 in uhal::Node::read (this=0x2973f80) at src/common/Node.cpp:594
#38 0x00007ffff798ef13 in Ph2_HwInterface::RegManager::ReadReg (this=this@entry=0x295cbd0, pRegNode="fc7_daq_cnfg.readout_block.packet_nbr") at /afs/cern.ch/user/n/ndeelen/Ph2_ACF/HWInterface/RegManager.cc:223
#39 0x00007ffff795ae6b in Ph2_HwInterface::D19cFWInterface::ReadData (this=0x295cbd0, pBoard=0x2950300, pBreakTrigger=false, pData=std::vector of length 0, capacity 0, pWait=<optimized out>)
    at /afs/cern.ch/user/n/ndeelen/Ph2_ACF/HWInterface/D19cFWInterface.cc:517
#40 0x00007ffff747fc87 in Ph2_System::SystemController::ReadData (this=this@entry=0x7fffffffc200, pBoard=pBoard@entry=0x2950300, pData=std::vector of length 0, capacity 0, pWait=pWait@entry=true)
    at /afs/cern.ch/user/n/ndeelen/Ph2_ACF/System/SystemController.cc:311
#41 0x00007ffff747fd0d in Ph2_System::SystemController::ReadData (this=this@entry=0x7fffffffc200, pBoard=pBoard@entry=0x2950300, pWait=pWait@entry=true)
    at /afs/cern.ch/user/n/ndeelen/Ph2_ACF/System/SystemController.cc:299
#42 0x00007ffff7239c50 in SignalScanFit::ScanSignal (this=this@entry=0x7fffffffce70, pSignalScanLength=pSignalScanLength@entry=30) at /afs/cern.ch/user/n/ndeelen/Ph2_ACF/tools/SignalScanFit.cc:117
---Type <return> to continue, or q <return> to quit---
#43 0x0000000000411727 in main (argc=4, argv=<optimized out>) at /afs/cern.ch/user/n/ndeelen/Ph2_ACF/src/commission.cc:144


#0  0x00007ffff2567eec in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x00007ffff24e505d in _L_lock_14730 () from /lib64/libc.so.6
#2  0x00007ffff24e2163 in malloc () from /lib64/libc.so.6
#3  0x00007ffff2fb81bd in operator new(unsigned long) () from /lib64/libstdc++.so.6
#4  0x00007ffff6588334 in TList::MakeIterator(bool) const () from /usr/lib64/root/libCore.so.6.10
#5  0x00007ffff38051e3 in THttpServer::ProcessRequests() () from /usr/lib64/root/libRHTTP.so.6.10
#6  0x00007ffff65612b2 in TTimer::Notify() () from /usr/lib64/root/libCore.so.6.10
#7  0x00007ffff6561201 in TTimer::CheckTimer(TTime const&) () from /usr/lib64/root/libCore.so.6.10
#8  0x00007ffff660638b in TUnixSystem::DispatchTimers(bool) () from /usr/lib64/root/libCore.so.6.10
#9  0x00007ffff6606517 in TUnixSystem::DispatchSignals(ESignals) () from /usr/lib64/root/libCore.so.6.10
#10 <signal handler called>
#11 0x00007ffff24df2a4 in _int_malloc () from /lib64/libc.so.6
#12 0x00007ffff24e210c in malloc () from /lib64/libc.so.6
#13 0x00007ffff2fb81bd in operator new(unsigned long) () from /lib64/libstdc++.so.6
#14 0x00007ffff76c1efb in allocate (this=0x7fffffffbd50, __n=11) at /usr/include/c++/4.8.2/ext/new_allocator.h:104
#15 _M_allocate (this=0x7fffffffbd50, __n=11) at /usr/include/c++/4.8.2/bits/stl_vector.h:168
#16 _M_range_initialize<__gnu_cxx::__normal_iterator<unsigned int const*, std::vector<unsigned int> > > (__last=..., __first=..., this=0x7fffffffbd50) at /usr/include/c++/4.8.2/bits/stl_vector.h:1201
#17 _M_initialize_dispatch<__gnu_cxx::__normal_iterator<unsigned int const*, std::vector<unsigned int> > > (__last=..., __first=..., this=0x7fffffffbd50) at /usr/include/c++/4.8.2/bits/stl_vector.h:1177
#18 vector<__gnu_cxx::__normal_iterator<unsigned int const*, std::vector<unsigned int> >, void> (__a=..., __last=..., __first=..., this=0x7fffffffbd50) at /usr/include/c++/4.8.2/bits/stl_vector.h:395
#19 Ph2_HwInterface::D19cCbc3Event::SetEvent (this=this@entry=0x2a7e3c0, pBoard=pBoard@entry=0x29505f0, pNbCbc=pNbCbc@entry=2, list=std::vector of length 28, capacity 32 = {...})
    at /afs/cern.ch/user/n/ndeelen/Ph2_ACF/Utils/D19cCbc3Event.cc:128
#20 0x00007ffff76c214a in Ph2_HwInterface::D19cCbc3Event::D19cCbc3Event (this=0x2a7e3c0, pBoard=0x29505f0, pNbCbc=2, list=std::vector of length 28, capacity 32 = {...})
    at /afs/cern.ch/user/n/ndeelen/Ph2_ACF/Utils/D19cCbc3Event.cc:22
#21 0x00007ffff76c7d8e in Ph2_HwInterface::Data::privateSet (this=0x29d5b90, pBoard=0x29505f0, pData=std::vector of length 28000, capacity 28000 = {...}, pNevents=<optimized out>, pType=D19C)
    at /afs/cern.ch/user/n/ndeelen/Ph2_ACF/Utils/Data.cc:130
#22 0x00007ffff76c884e in operator() (this=0x2a43088) at /usr/include/c++/4.8.2/functional:2471
#23 operator() (this=0x2a43080) at /usr/include/c++/4.8.2/future:1235
#24 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, void> >::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/4.8.2/functional:2057
#25 0x00007ffff76c86ee in operator() (this=<optimized out>) at /usr/include/c++/4.8.2/functional:2471
#26 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&) (this=0x2a7e018, __f=..., 
    __set=@0x7fffffffc000: false) at /usr/include/c++/4.8.2/future:471
#27 0x00007ffff2831e20 in pthread_once () from /lib64/libpthread.so.0
#28 0x00007ffff76c907a in __gthread_once (__func=<optimized out>, __once=0x2a7e084) at /usr/include/c++/4.8.2/x86_64-redhat-linux/bits/gthr-default.h:699
#29 call_once<void (std::__future_base::_State_base::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>()>&, bool&), std::__future_base::_State_base* const, std::reference_wrapper<std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>()> >, std::reference_wrapper<bool> > (__f=<optimized out>, __once=...)
    at /usr/include/c++/4.8.2/mutex:786
#30 _M_set_result (__ignore_failure=true, __res=..., this=0x2a7e018) at /usr/include/c++/4.8.2/future:358
#31 std::__future_base::_Deferred_state<std::_Bind_simple<std::_Mem_fn<void (Ph2_HwInterface::Data::*)(Ph2_HwDescription::BeBoard const*, std::vector<unsigned int, std::allocator<unsigned int> > const&, unsigned int, BoardType)> (Ph2_HwInterface::Data*, Ph2_HwDescription::BeBoard const*, std::vector<unsigned int, std::allocator<unsigned int> >, unsigned int, BoardType)>, void>::_M_run_deferred() (this=0x2a7e018)
    at /usr/include/c++/4.8.2/future:1465
#32 0x00007ffff7239c88 in wait (this=0x2a7e018) at /usr/include/c++/4.8.2/future:325
#33 _M_get_result (this=0x29d5c18) at /usr/include/c++/4.8.2/future:596
#34 get (this=0x29d5c18) at /usr/include/c++/4.8.2/future:761
#35 GetEvents (pBoard=0x29505f0, this=0x29d5b90) at /afs/cern.ch/user/n/ndeelen/Ph2_ACF/tools/../System/../HWInterface/../Utils/Data.h:150
#36 GetEvents (this=0x7fffffffc200, pBoard=0x29505f0) at /afs/cern.ch/user/n/ndeelen/Ph2_ACF/tools/../System/SystemController.h:223
#37 SignalScanFit::ScanSignal (this=this@entry=0x7fffffffce70, pSignalScanLength=pSignalScanLength@entry=30) at /afs/cern.ch/user/n/ndeelen/Ph2_ACF/tools/SignalScanFit.cc:126
#38 0x0000000000411727 in main (argc=4, argv=<optimized out>) at /afs/cern.ch/user/n/ndeelen/Ph2_ACF/src/commission.cc:144


                        std::vector<uint32_t> cCbcData (std::next (std::begin (list), begin), std::next (std::begin (list), end) );

What happens is that frame 15 calls malloc(), then in frame 13 it receives a signal handler, which in turn calls malloc() in frame 2, and we got a deadlock. malloc() is not an async-signal-safe function. I believe the problem lies here:

A timer gets called in a regular interval and when this happens during a malloc(), we end up with the deadlock outlined above, since the timer, too, wants to allocate memory, but will wait for the previous malloc() to finish, which in turn waits for the signal handler to finish.

May be @linev can help you.

Hi,

Did you change timer configuration in THttpServer?
By default, it is synchronous and executed in gSystem->ProcessEvents().
I see no reasons, how it can interfere with malloc().
Or are you using own threads?

Regards, Sergey

Hi Sergey

Thank you for your reply. Yes, the timer is asynchronously, otherwise we don’t see any problems. This deadlock happens more often, the smaller the timeout (signal handler is called more often) and the slower the computer (memory allocation takes longer). Both increase the probability of an asynchronous signal handler being called during a malloc().

The problem is that when a timeout happens, THttpServer::ProcessRequests() is called:

This in turn calls TThread::SelfId(), which in turn calls TThread::Init() and finally TPosixThreadFactory::CreateThreadImp(), which calls new, which calls malloc():

Since malloc() is not async-signal-safe, either the timeout function must not be allowed to call THttpServer::ProcessRequests(), or the timer must not be asynchronous. Since you want to keep the functionality of ProcessRequests(), I think you should not allow an asynchronous timer here.

That is my understanding of the situation. Please let me know if you think my logic is flawed.

Yes, the timer is asynchronously, otherwise we don’t see any problems.

But why you configure it this way?

Asynchronous timer uses system signals, which may appear in-between of any operation, including memory allocation.
Therefore asynchronous mode can be used only in single-threaded application and even then I am not absolutely sure about all boundary conditions. I do not recommend to use it this way.

You should configure synchronous timer and call gSystem->ProcessEvents() regularly in your application.
Or you can disable timer and call server->ProcessRequests(). All these calls should be performed from main thread.

Regards,
Sergey

But why you configure it this way?

This I do not know, I did not write that part of the software. I merely ran into the issue.

Therefore asynchronous mode can be used only in single-threaded application and even then I am not absolutely sure about all boundary conditions.

As shown above, it is not safe, even for a single threaded application. Nothing prevents the operating system from interrupting you during a function which is not async-safe. Giving the user the choice of an asynchronous timer that can cause a deadlock, is a bug in my view.

As shown above, it is not safe, even for a single threaded application. Nothing prevents the operating system from interrupting you during a function which is not async-safe. Giving the user the choice of an asynchronous timer that can cause a deadlock, is a bug in my view.

I did not test it for a while, but before asynchronous timer was working relatively stable for simple single-threaded application. Probably, one should add caution in docu.

Regards,
Sergey

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.