PROOF problem

Hi,
I don’t know, is this right place to talk about this problem, but it related to PROOF.
I’m writing application, that doing some analysis and user can ask to do this analysis with PROOF. The problem is that if I’m connecting to PROOF master and, for example, generating event list, after application finishes ROOT crashes with segmentation violation. I thought, that I have problem in my code, so decided to write some simple app, that will just connect to master, then close connection and exit. But it still crashes. Is this some kind of ROOT “feature”?

Hi,

Sorry for the late reply.

Do you mean that you just start and close a PROOF session and you get a crash?
Of course this is not a feature.
Could you specify which ROOT version you are using?

G. Ganis

I’m using version 5.14.00, but also tried with 5.16.00 In my application, when I’m using PROOF, after it finished and TROOT destructor called I have segmentation violation. I’ve tried very simple application - create connection to PROOF master, get it’s manager and close connection, but got the same result. Any ideas?

Hi,

I never had this problem.
Could you please post the exact commands corresponding to this:

Thanks,
G. Ganis

Here is my test application:

#include "TProof.h"
#include "TProofMgr.h"

#include <iostream>
using namespace std;

int main(int argc, char **argv)
{
	cout << "Open session"  << endl;
	TProof *proof=TProof::Open("zenith226.desy.de");
	cout << "Get manager"  << endl;
	TProofMgr *mgr=proof->GetManager();
	cout << "Closing session"  << endl;
	mgr->ShutdownSession(proof);
	cout << "Deleting pointer"  << endl;
	delete proof;
	cout << "Deleting manager"  << endl;
	delete mgr;
	cout << "Bye"  << endl;
}

Compilation: g++ root-config --libs --cflags -o proof -lProof -lTreePlayer -lThread main.cxx

Output:
[zenith223:proof] ./proof
Open session
Starting master: opening connection …
Starting master: OK
PROOF set to parallel mode (8 workers)
Get manager
Closing session
Checking pointer
Deleting pointer
Destroing manager
Bye

*** Break *** segmentation violation
(no debugging symbols found)
Using host libthread_db library “/lib/tls/libthread_db.so.1”.
Attaching to program: /proc/13109/exe, process 13109
(no debugging symbols found)…done.
(no debugging symbols found)…done.
(no debugging symbols found)…done.
(no debugging symbols found)…done.
(no debugging symbols found)…done.
(no debugging symbols found)…done.
(no debugging symbols found)…done.
(no debugging symbols found)…done.
(no debugging symbols found)…done.
(no debugging symbols found)…done.
[Thread debugging using libthread_db enabled]
[New Thread -1218596736 (LWP 13109)]
(no debugging symbols found)…done.
(no debugging symbols found)…done.
0x025e0501 in __waitpid_nocancel ()
from /lib/tls/libc.so.6
Thread 1 (Thread -1218596736 (LWP 13109)):
#0 0x025e0501 in __waitpid_nocancel () from /lib/tls/libc.so.6
#1 0x025751c4 in do_system () from /lib/tls/libc.so.6
#2 0x0257503c in system () from /lib/tls/libc.so.6
#3 0x0085cd7f in system () from /lib/tls/libpthread.so.0
#4 0x00337f33 in TUnixSystem::Exec ()
from /opt/products/root/5.16.00/lib/libCore.so.5.16
#5 0x00338315 in TUnixSystem::StackTrace ()
from /opt/products/root/5.16.00/lib/libCore.so.5.16
#6 0x0033616d in TUnixSystem::DispatchSignals ()
from /opt/products/root/5.16.00/lib/libCore.so.5.16
#7 0x003342b9 in SigHandler ()
from /opt/products/root/5.16.00/lib/libCore.so.5.16
#8 0x0033aaa5 in sighandler ()
from /opt/products/root/5.16.00/lib/libCore.so.5.16
#9
#10 0x002fcf5b in TCollection::GarbageCollect ()
from /opt/products/root/5.16.00/lib/libCore.so.5.16
#11 0x003008b1 in TList::Delete ()
from /opt/products/root/5.16.00/lib/libCore.so.5.16
#12 0x002bedf9 in TROOT::~TROOT ()
from /opt/products/root/5.16.00/lib/libCore.so.5.16
#13 0x002c2fff in __tcf_0 ()
from /opt/products/root/5.16.00/lib/libCore.so.5.16
#14 0x02561b78 in __cxa_finalize () from /lib/tls/libc.so.6
#15 0x00282ce1 in __do_global_dtors_aux ()
from /opt/products/root/5.16.00/lib/libCore.so.5.16
#16 0x005ec97a in _fini () from /opt/products/root/5.16.00/lib/libCore.so.5.16
#17 0x0086ef0d in _dl_fini () from /lib/ld-linux.so.2
#18 0x025618e3 in exit () from /lib/tls/libc.so.6
#19 0x0254c7fc in __libc_start_main () from /lib/tls/libc.so.6
#20 0x08048a4d in _start ()

Now I’ve compiled this with ROOT 5.16. The same problem occurs, if I’m using 5.14.
If comment lines from “cout << “Get manager” << endl;” to “cout << “Bye” << endl;” and recompile - nothing will change (except some output to screen, which were commented)

Maksym

Hi,

Thanks for the sample.

I think I have understood the problem: managers are registered in an internal global list for proper destruction at exit, so you do not need to destroy them explicitely; of course, you should not get a crash if you happen to do that. The reason you get the crash is that unfortunately the destructor does not unregister properly when it gets called directly.
I will fix that.

Could you please confirm that if you remove the line deleting the manager the crash is gone?

Please, also note the TProofMgr::ShutdownSession(TProof *) is currently broken.
By default deleting the ‘proof’ object is sufficient to trigger a shutdown of the session (unless the server is set differently).
If you want to be explicit you can use call proof->Detach("S) .

G. Ganis

No, application still carshes:

#include "TProof.h"

#include <iostream>
using namespace std;

int main(int argc, char **argv)
{
	cout << "Open session"  << endl;
	TProof *proof=TProof::Open("zenith226.desy.de");
	delete proof;
	cout << "Bye"  << endl;
}

Output:[zenith223:proof] ./proof
Open session
Starting master: opening connection …
Starting master: OK
PROOF set to parallel mode (8 workers)
Bye

*** Break *** segmentation violation
(no debugging symbols found)
Using host libthread_db library “/lib/tls/libthread_db.so.1”.
Attaching to program: /proc/20348/exe, process 20348
(no debugging symbols found)…done.
(no debugging symbols found)…done.
(no debugging symbols found)…done.
(no debugging symbols found)…done.
(no debugging symbols found)…done.
(no debugging symbols found)…done.
(no debugging symbols found)…done.
(no debugging symbols found)…done.
(no debugging symbols found)…done.
(no debugging symbols found)…done.
[Thread debugging using libthread_db enabled]
[New Thread -1218592640 (LWP 20348)]
[New Thread -1218729040 (LWP 20349)]
(no debugging symbols found)…done.
(no debugging symbols found)…done.
0x018ab501 in __waitpid_nocancel ()
from /lib/tls/libc.so.6
Thread 2 (Thread -1218729040 (LWP 20349)):
#0 0x018d596d in poll () from /lib/tls/libc.so.6
#1 0x01540703 in XrdClientSock::RecvRaw (this=0x84924f0, buffer=0x84f2fc4,
length=8, substreamid=-1, usedsubstreamid=0xb75ba5bc)
at XrdClientSock.cc:112
#2 0x0157594f in XrdClientPhyConnection::ReadRaw (this=0x8492060,
buf=0x84f2fc4, len=8, substreamid=-1, usedsubstreamid=0xb75ba5bc)
at XrdClientPhyConnection.cc:332
#3 0x0156dc6e in XrdClientMessage::ReadRaw (this=0x84f2fb8, phy=0x8492060)
at XrdClientMessage.cc:139
#4 0x015770d5 in XrdClientPhyConnection::BuildMessage (this=0x8492060,
IgnoreTimeouts=true, Enqueue=true) at XrdClientPhyConnection.cc:412
#5 0x0156f1ed in SocketReaderThread (arg=0x8492060, thr=0x84918b0)
at XrdClientPhyConnection.cc:56
#6 0x0157f608 in XrdClientThreadDispatcher (arg=0x84918bc)
at XrdClientThread.cc:29
#7 0x015a2f8e in XrdOucThread_Xeq ()
from /opt/products/root/5.16.00/lib/libXrdClient.so
#8 0x00571dd8 in start_thread () from /lib/tls/libpthread.so.0
#9 0x018ded2a in clone () from /lib/tls/libc.so.6

Thread 1 (Thread -1218592640 (LWP 20348)):
#0 0x018ab501 in __waitpid_nocancel () from /lib/tls/libc.so.6
#1 0x018401c4 in do_system () from /lib/tls/libc.so.6
#2 0x0184003c in system () from /lib/tls/libc.so.6
#3 0x00577d7f in system () from /lib/tls/libpthread.so.0
#4 0x008edf33 in TUnixSystem::Exec ()
from /opt/products/root/5.16.00/lib/libCore.so.5.16
#5 0x008ee315 in TUnixSystem::StackTrace ()
from /opt/products/root/5.16.00/lib/libCore.so.5.16
#6 0x008ec16d in TUnixSystem::DispatchSignals ()
from /opt/products/root/5.16.00/lib/libCore.so.5.16
#7 0x008ea2b9 in SigHandler ()
from /opt/products/root/5.16.00/lib/libCore.so.5.16
#8 0x008f0aa5 in sighandler ()
from /opt/products/root/5.16.00/lib/libCore.so.5.16
#9
#10 0x06c2b384 in vtable for __cxxabiv1::__si_class_type_info ()
from /usr/lib/libstdc++.so.5
#11 0x04dd6111 in TXSocket::FlushPipe ()
from /opt/products/root/5.16.00/lib/libProofx.so
#12 0x04dd52ae in TXSocket::Close ()
from /opt/products/root/5.16.00/lib/libProofx.so
#13 0x04dcaa1c in TXProofMgr::~TXProofMgr$delete ()
from /opt/products/root/5.16.00/lib/libProofx.so
#14 0x008b2f5e in TCollection::GarbageCollect ()
from /opt/products/root/5.16.00/lib/libCore.so.5.16
#15 0x008b68b1 in TList::Delete ()
from /opt/products/root/5.16.00/lib/libCore.so.5.16
#16 0x00874df9 in TROOT::~TROOT ()
from /opt/products/root/5.16.00/lib/libCore.so.5.16
#17 0x00878fff in __tcf_0 ()
from /opt/products/root/5.16.00/lib/libCore.so.5.16
#18 0x0182cb78 in __cxa_finalize () from /lib/tls/libc.so.6
#19 0x00838ce1 in __do_global_dtors_aux ()
from /opt/products/root/5.16.00/lib/libCore.so.5.16
#20 0x00ba297a in _fini () from /opt/products/root/5.16.00/lib/libCore.so.5.16
#21 0x0059bf0d in _dl_fini () from /lib/ld-linux.so.2
#22 0x0182c8e3 in exit () from /lib/tls/libc.so.6
#23 0x018177fc in __libc_start_main () from /lib/tls/libc.so.6
#24 0x08048a1d in _start ()

Maksym

Hi,

Sorry, I made the mistake to try your code in a macro inside a ROOT session where I was not getting any crash after the change I suggested to you.

But if I really run your small standalone program I see the crash.

The difference between the two cases is that when you type “.q” in a ROOT session the correct order of destruction is insured by a call to TSystem::Exit; this is important especially when you have less basic objects like sockets.

So, with the following change the application should terminate correctly:

#include "TProof.h"
#include "TSystem.h"

#include <iostream>
using namespace std;

int main(int argc, char **argv)
{
   cout << "Open session"  << endl;
   TProof *proof=TProof::Open("zenith226.desy.de");
   delete proof;
   cout << "Bye"  << endl;
   gSystem->Exit(0);
} 

Please try and let me know.

G. Ganis

Thanks a lot! It works fine now!
Maksym