How not to build package on client?

Hi,

Our C++ application uses PROOF while supporting plain standalone build without PROOF at all. The problem is that at the moment of runing our application in PROOF mode, we already have compiled version of all the code. But since we want our code to be recompiled on remote PROOF cluster (because of different arch/gcc/whatever), we are forced to include source in PAR files and send it to cluster.

But the same PAR file will be compiled by PROOF on client! Is there any way to avoid such a behavior? It would be nice to have a kind of environment variable, say PROOF_MODE=Master, PROOF_MODE=Client and so on, to prevent our BUILD.sh from actually build package on client.

The ugly hack I’ve found is to use $XRDHOST environment variable, which seems to be defined only on master and worker but not on client.


Eugeny Boger,
JINR Dubna.

Dear Eugeny,

Yes, set the ‘notOnClient’ boolean to kTRUE (see root.cern.ch/root/html/TProof.ht … blePackage).

G. Ganis

Thank you for the reply!

I’ve already tried that: in this case the program fails with segmentation violation.
This is the stack trace:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1208863040 (LWP 25475)]
0x01cc1ec4 in TGCompositeFrame::TGCompositeFrame$base () from /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/root/lib/libGui.so
(gdb) bt
#0  0x01cc1ec4 in TGCompositeFrame::TGCompositeFrame$base () from /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/root/lib/libGui.so
#1  0x01cc3b32 in TGMainFrame::TGMainFrame$base () from /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/root/lib/libGui.so
#2  0x01cc5782 in TGTransientFrame::TGTransientFrame () from /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/root/lib/libGui.so
#3  0x0683e5c3 in TProofProgressDialog::TProofProgressDialog () from /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/root/lib/libSessionViewer.so
#4  0x06885867 in G__G__SessionViewer_124_0_2 () from /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/root/lib/libSessionViewer.so
#5  0x00a5f341 in Cint::G__CallFunc::Execute () from /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/root/lib/libCint.so
#6  0x00590292 in TCint::CallFunc_ExecInt () from /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/root/lib/libCore.so
#7  0x005b9b90 in TMethodCall::Execute () from /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/root/lib/libCore.so
#8  0x00516cdc in TPluginHandler::ExecPlugin () from /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/root/lib/libCore.so
#9  0x0549abb4 in TProof::HandleInputMessage () from /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/root/lib/libProof.so
#10 0x0549de21 in TProof::CollectInputFrom () from /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/root/lib/libProof.so
#11 0x054996b7 in TProof::Collect () from /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/root/lib/libProof.so
#12 0x0549a308 in TProof::Collect () from /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/root/lib/libProof.so
#13 0x06efc882 in TProofPlayerRemote::Process () from /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/root/lib/libProofPlayer.so
#14 0x0548c478 in TProof::Process () from /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/root/lib/libProof.so
#15 0x0547d382 in TDSet::Process () from /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/root/lib/libProof.so
#16 0x05482860 in TProofChain::Process () from /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/root/lib/libProof.so
#17 0x0334b341 in TChain::Process () from /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/root/lib/libTree.so
#18 0x0804d136 in main (argc=6, argv=0xbfeb4cc4) at main.cxx:356

The appropriate code block looks like

     chain.SetProof();
     if( !nentries ) {
       chain.Process("ReadDst");
     } else {
       chain.Process("ReadDst","",nentries);
     }

Ok, but then why do you say that this is related to building the package on the client?
This looks something else.
What happens if you run in batch mode, i.e. without the progress dialog? Just start ROOT as ‘root -b’ …

Also, could you specify which version of ROOT you are running?

G. Ganis

Because with notOnClient = kFALSE everything is o.k.

The program is compiled not interpreted, so i have no idea why the error is related to TProofProgressDialog. With client build enabled there is simple text progress bar.

BTW, I’ve recompiled ROOT with debugging enabled. Attached is the detailed stack trace.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1208178112 (LWP 709)]
0x065c1c62 in TGCompositeFrame (this=0x8b29668, p=0x0, w=10, h=10, options=2051, back=0)
    at /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/dbg/root/gui/gui/src/TGFrame.cxx:827
827	   fMapSubwindows = fParent->IsMapSubwindows();
(gdb) bt
#0  0x065c1c62 in TGCompositeFrame (this=0x8b29668, p=0x0, w=10, h=10, options=2051, back=0)
    at /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/dbg/root/gui/gui/src/TGFrame.cxx:827
#1  0x065c3ddb in TGMainFrame (this=0x8b29668, p=0x0, w=10, h=10, options=2050)
    at /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/dbg/root/gui/gui/src/TGFrame.cxx:1402
#2  0x065c6292 in TGTransientFrame (this=0x8b29668, p=0x0, main=0x0, w=10, h=10, options=2)
    at /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/dbg/root/gui/gui/src/TGFrame.cxx:1850
#3  0x01b29f1d in TProofProgressDialog (this=0x8b24bc8, proof=0x89684a8, selector=0x8b0eab0 "ReadDst", files=1, first=0, entries=10000)
    at /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/dbg/root/gui/sessionviewer/src/TProofProgressDialog.cxx:142
#4  0x01b823ef in G__G__SessionViewer_124_0_2 () from /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/dbg/root/lib/libSessionViewer.so
#5  0x01066206 in Cint::G__CallFunc::Execute (this=0x8b1a0e8, pobject=0x0) at cint/cint/src/CallFunc.cxx:440
#6  0x00a98c8b in Cint::G__CallFunc::ExecInt (this=0x8b1a0e8, pobject=0x0) at include/CallFunc.h:98
#7  0x00a96217 in TCint::CallFunc_ExecInt (this=0x85ef020, func=0x8b1a0e8, address=0x0)
    at /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/dbg/root/core/meta/src/TCint.cxx:2415
#8  0x00ac60db in TMethodCall::Execute (this=0x8b0f170, object=0x0, retLong=@0xbfecd9ec)
    at /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/dbg/root/core/meta/src/TMethodCall.cxx:375
#9  0x009fe9a7 in TMethodCall::Execute (this=0x8b0f170, retLong=@0xbfecd9ec) at include/TMethodCall.h:112
#10 0x009fc536 in TPluginHandler::ExecPlugin (this=0x8992c68, nargs=5)
    at /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/dbg/root/core/base/src/TPluginManager.cxx:316
#11 0x0165c52e in TProof::HandleInputMessage (this=0x89684a8, sl=0x8987010, mess=0x8b0f0d0, deactonfail=false)
    at /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/dbg/root/proof/proof/src/TProof.cxx:3147
#12 0x016599b9 in TProof::CollectInputFrom (this=0x89684a8, s=0x8987320, endtype=-1, deactonfail=false)
    at /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/dbg/root/proof/proof/src/TProof.cxx:2635
#13 0x01659294 in TProof::Collect (this=0x89684a8, mon=0x8986a80, timeout=-1, endtype=-1, deactonfail=false)
    at /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/dbg/root/proof/proof/src/TProof.cxx:2507
#14 0x01658d6b in TProof::Collect (this=0x89684a8, list=TProof::kActive, timeout=-1, endtype=-1, deactonfail=false)
    at /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/dbg/root/proof/proof/src/TProof.cxx:2422
#15 0x040fd200 in TProofPlayerRemote::Process (this=0x89866d0, dset=0x8a9bcf0, selector_file=0x804fb8c "ReadDst", option=0xf59118 "", nentries=10000, 
    first=0) at /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/dbg/root/proof/proofplayer/src/TProofPlayer.cxx:1796
#16 0x0166255f in TProof::Process (this=0x89684a8, dset=0x8a9bcf0, selector=0x804fb8c "ReadDst", option=0x804f89d "", nentries=10000, first=0)
    at /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/dbg/root/proof/proof/src/TProof.cxx:4338
#17 0x0163f864 in TDSet::Process (this=0x8a9bcf0, selector=0x804fb8c "ReadDst", option=0x804f89d "", nentries=10000, first=0, enl=0x0)
    at /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/dbg/root/proof/proof/src/TDSet.cxx:890
#18 0x01647ef0 in TProofChain::Process (this=0x8a9b580, filename=0x804fb8c "ReadDst", option=0x804f89d "", nentries=10000, firstentry=0)
    at /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/dbg/root/proof/proof/src/TProofChain.cxx:307
#19 0x075ba6a4 in TChain::Process (this=0xbfece6a0, filename=0x804fb8c "ReadDst", option=0x804f89d "", nentries=10000, firstentry=0)
    at /afs/ihep.ac.cn/users/e/eugenyboger/panfs/new/dbg/root/tree/tree/src/TChain.cxx:1938
#20 0x0804d1ad in main (argc=6, argv=0xbfeced84) at main.cxx:357

UPD: I use latest ROOT 5.28
UPD2:
I’ve added

gROOT->SetBatch(true);

to the very beginning of my code. Now everything is working ok.

Thank you! The batch mode and disable build on client helped me.

However, there are some unsolved problems.

My application uses several packages. For each package (say, “BeanUser”) there is BUILD.sh script, which jush calls “make” in package directory. The “make” generates shared library (libUser.so).
So, in SETUP.c I have to call gSystem->Load(“libUser.so”) for my program to work.

Everything was ok when client build was enabled. With disabled client build the application works well with remote PROOF cluster since on client all the classes had already been linked with main executable.

However with ProofLite I’m expierencing problems. Since the package wasn’t built on workers, there is no libUser.so library in workers’ working directory. So, my application fails to load the library and thus doesn’t work.

I’ve found a possible workaround by using something like

TProof::AddEnvVar("LD_PRELOAD", "../BeanUser/libUser.so");

But again it looks ugly to me.

Hi,

The crash that you get is very weird: a line where just a ‘new’ is called. This may indicate other problems.
Since you have recompiled in debug mode I suggest that you run the client within valgrind, to see what’s going on (just type ‘valgrind -v root.exe’ and redo your stuff inside when you get the prompt).

The solution to your other problem should be to everything with EnablePackage, I mean also enabling things on the client. In that way you are sure that things are loaded everywhere. What prevents you to that?

G. Ganis

Hi,
sorry for the delay.

As I mentioned above, our C++ application could run without PROOF at all. The PROOF is enabled by command-line parameter at the run time. All the code is already compiled at the moment when user runs application with PROOF enabled. I’m trying to avoid the pointless recompilation on the client. With remote PROOF it’s o.k., but with proof-lite the application won’t work without recompilation. So right now I see two possible solutions:

  1. Include binaries to PAR and somehow force rebuild them on remote proof cluster, but do not rebuild them while using ProofLite
  2. Do not include binaries to PAR and somehow make proofserv.exe to load neccessary libraries not from packages.

The output of "valgrind -v ./bean.exe -p “proof@prfserver01:2093” /bes3fs/offline/data/651-1/mc/dst/jpsi/mc1611*.dst " (bean.exe is the application binary)

<...>
Mst-0: building BeanUser ...
Mst-0: make: Nothing to be done for `lib'.
/panfs/panfs.ihep.ac.cn/home/data/eugenyboger/new/bean/workdir
==14509== Syscall param socketcall.send(msg) points to uninitialised byte(s)
==14509==    at 0xD52A58: send (in /lib/tls/libpthread-2.3.4.so)
==14509==    by 0x79F13CE: XrdClientSock::SendRaw(void const*, int, int) (XrdClientSock.cc:310)
==14509==    by 0x7A13CB7: XrdClientPhyConnection::WriteRaw(void const*, int, int) (XrdClientPhyConnection.cc:623)
==14509==    by 0x7A0DE8A: XrdClientLogConnection::WriteRaw(void const*, int, int) (XrdClientLogConnection.cc:58)
==14509==    by 0x7A07EB5: XrdClientConnectionMgr::WriteRaw(int, void const*, int, int) (XrdClientConnMgr.cc:595)
==14509==    by 0x7990DB3: XrdProofConn::WriteRaw(void const*, int) (XrdProofConn.cxx:892)
==14509==    by 0x798FCF4: XrdProofConn::LowWrite(XPClientRequest*, void const*, int) (XrdProofConn.cxx:756)
==14509==    by 0x798E863: XrdProofConn::SendRecv(XPClientRequest*, void const*, char**) (XrdProofConn.cxx:512)
==14509==    by 0x798F37C: XrdProofConn::SendReq(XPClientRequest*, void const*, char**, char const*, bool) (XrdProofConn.cxx:627)
==14509==    by 0x7983536: TXSocket::SendRaw(void const*, int, ESendRecvOptions) (TXSocket.cxx:1193)
==14509==    by 0x7984D6D: TXSocket::Send(TMessage const&) (TXSocket.cxx:1690)
==14509==    by 0x6458209: TProof::Broadcast(TMessage const&, TList*) (TProof.cxx:2167)
==14509==  Address 0xA736180 is 1,656 bytes inside a block of size 2,072 alloc'd
==14509==    at 0x4004BBE: operator new[](unsigned) (vg_replace_malloc.c:197)
==14509==    by 0x42125FC: TStorage::ReAllocChar(char*, unsigned, unsigned) (TStorage.cxx:272)
==14509==    by 0x41BDD67: TBuffer::Expand(int) (TBuffer.cxx:189)
==14509==    by 0x4E1C231: TBufferFile::WriteUInt(unsigned) (TBufferFile.h:365)
==14509==    by 0x41C7C86: operator<<(TBuffer&, unsigned) (TBuffer.h:343)
==14509==    by 0x4E171FB: TBufferFile::WriteObjectClass(void const*, TClass const*) (TBufferFile.cxx:2362)
==14509==    by 0x4E1741E: TBufferFile::WriteObjectAny(void const*, TClass const*) (TBufferFile.cxx:2446)
==14509==    by 0x4E16A2C: TBufferFile::WriteFastArray(void**, TClass const*, int, bool, TMemberStreamer*) (TBufferFile.cxx:2164)
==14509==    by 0x4F091E8: int TStreamerInfo::WriteBufferAux<char**>(TBuffer&, char** const&, int, int, int, int) (TStreamerInfoWriteBuffer.cxx:464)
==14509==    by 0x4E1A1E1: TBufferFile::WriteClassBuffer(TClass const*, void*) (TBufferFile.cxx:3584)
==14509==    by 0x6443CE7: TDSetElement::Streamer(TBuffer&) (TDSet.cxx:1733)
==14509==    by 0x429EA35: TClass::StreamerTObjectInitialized(void*, TBuffer&, TClass const*) const (TClass.cxx:4963)
==14509== 
==14509== Use of uninitialised value of size 4
==14509==    at 0x9FE6C62: TGCompositeFrame::TGCompositeFrame(TGWindow const*, unsigned, unsigned, unsigned, unsigned long) (TGFrame.cxx:827)
==14509==    by 0x9FE8DDA: TGMainFrame::TGMainFrame(TGWindow const*, unsigned, unsigned, unsigned) (TGFrame.cxx:1402)
==14509==    by 0x9FEB291: TGTransientFrame::TGTransientFrame(TGWindow const*, TGWindow const*, unsigned, unsigned, unsigned) (TGFrame.cxx:1850)
==14509==    by 0x7E8DF1C: TProofProgressDialog::TProofProgressDialog(TProof*, char const*, int, long long, long long) (TProofProgressDialog.cxx:142)
==14509==    by 0x7EE63EE: G__G__SessionViewer_124_0_2(G__value*, char const*, G__param*, int) (in /panfs/panfs.ihep.ac.cn/home/data/eugenyboger/new/dbg/root/lib/libSessionViewer.so)
==14509==    by 0x4856205: Cint::G__CallFunc::Execute(void*) (CallFunc.cxx:440)
==14509==    by 0x4288C8A: Cint::G__CallFunc::ExecInt(void*) (CallFunc.h:98)
==14509==    by 0x4286216: TCint::CallFunc_ExecInt(void*, void*) const (TCint.cxx:2415)
==14509==    by 0x42B60DA: TMethodCall::Execute(void*, long&) (TMethodCall.cxx:375)
==14509==    by 0x41EE9A6: TMethodCall::Execute(long&) (TMethodCall.h:112)
==14509==    by 0x41EC535: TPluginHandler::ExecPlugin(int, ...) (TPluginManager.cxx:316)
==14509==    by 0x645C52D: TProof::HandleInputMessage(TSlave*, TMessage*, bool) (TProof.cxx:3147)
==14509== 
==14509== Jump to the invalid address stated on the next line
==14509==    at 0x874000C: ???
==14509==    by 0x9FE8DDA: TGMainFrame::TGMainFrame(TGWindow const*, unsigned, unsigned, unsigned) (TGFrame.cxx:1402)
==14509==    by 0x9FEB291: TGTransientFrame::TGTransientFrame(TGWindow const*, TGWindow const*, unsigned, unsigned, unsigned) (TGFrame.cxx:1850)
==14509==    by 0x7E8DF1C: TProofProgressDialog::TProofProgressDialog(TProof*, char const*, int, long long, long long) (TProofProgressDialog.cxx:142)
==14509==    by 0x7EE63EE: G__G__SessionViewer_124_0_2(G__value*, char const*, G__param*, int) (in /panfs/panfs.ihep.ac.cn/home/data/eugenyboger/new/dbg/root/lib/libSessionViewer.so)
==14509==    by 0x4856205: Cint::G__CallFunc::Execute(void*) (CallFunc.cxx:440)
==14509==    by 0x4288C8A: Cint::G__CallFunc::ExecInt(void*) (CallFunc.h:98)
==14509==    by 0x4286216: TCint::CallFunc_ExecInt(void*, void*) const (TCint.cxx:2415)
==14509==    by 0x42B60DA: TMethodCall::Execute(void*, long&) (TMethodCall.cxx:375)
==14509==    by 0x41EE9A6: TMethodCall::Execute(long&) (TMethodCall.h:112)
==14509==    by 0x41EC535: TPluginHandler::ExecPlugin(int, ...) (TPluginManager.cxx:316)
==14509==    by 0x645C52D: TProof::HandleInputMessage(TSlave*, TMessage*, bool) (TProof.cxx:3147)
==14509==  Address 0x874000C is not stack'd, malloc'd or (recently) free'd

 *** Break *** segmentation violation
--14509-- discard syms at 0x6A3F000-0x6A4A000 in /lib/libnss_files-2.3.4.so due to munmap()
--14509-- discard syms at 0x6033000-0x6039000 in /lib/libnss_dns-2.3.4.so due to munmap()
--14509-- discard syms at 0x19E000-0x1B1000 in /lib/libresolv-2.3.4.so due to munmap()
==14509== 
==14509== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 82 from 2)
==14509== 
==14509== 1 errors in context 1 of 3:
==14509== Jump to the invalid address stated on the next line
==14509==    at 0x874000C: ???
==14509==    by 0x9FE8DDA: TGMainFrame::TGMainFrame(TGWindow const*, unsigned, unsigned, unsigned) (TGFrame.cxx:1402)
==14509==    by 0x9FEB291: TGTransientFrame::TGTransientFrame(TGWindow const*, TGWindow const*, unsigned, unsigned, unsigned) (TGFrame.cxx:1850)
==14509==    by 0x7E8DF1C: TProofProgressDialog::TProofProgressDialog(TProof*, char const*, int, long long, long long) (TProofProgressDialog.cxx:142)
==14509==    by 0x7EE63EE: G__G__SessionViewer_124_0_2(G__value*, char const*, G__param*, int) (in /panfs/panfs.ihep.ac.cn/home/data/eugenyboger/new/dbg/root/lib/libSessionViewer.so)
==14509==    by 0x4856205: Cint::G__CallFunc::Execute(void*) (CallFunc.cxx:440)
==14509==    by 0x4288C8A: Cint::G__CallFunc::ExecInt(void*) (CallFunc.h:98)
==14509==    by 0x4286216: TCint::CallFunc_ExecInt(void*, void*) const (TCint.cxx:2415)
==14509==    by 0x42B60DA: TMethodCall::Execute(void*, long&) (TMethodCall.cxx:375)
==14509==    by 0x41EE9A6: TMethodCall::Execute(long&) (TMethodCall.h:112)
==14509==    by 0x41EC535: TPluginHandler::ExecPlugin(int, ...) (TPluginManager.cxx:316)
==14509==    by 0x645C52D: TProof::HandleInputMessage(TSlave*, TMessage*, bool) (TProof.cxx:3147)
==14509==  Address 0x874000C is not stack'd, malloc'd or (recently) free'd
==14509== 
==14509== 1 errors in context 2 of 3:
==14509== Use of uninitialised value of size 4
==14509==    at 0x9FE6C62: TGCompositeFrame::TGCompositeFrame(TGWindow const*, unsigned, unsigned, unsigned, unsigned long) (TGFrame.cxx:827)
==14509==    by 0x9FE8DDA: TGMainFrame::TGMainFrame(TGWindow const*, unsigned, unsigned, unsigned) (TGFrame.cxx:1402)
==14509==    by 0x9FEB291: TGTransientFrame::TGTransientFrame(TGWindow const*, TGWindow const*, unsigned, unsigned, unsigned) (TGFrame.cxx:1850)
==14509==    by 0x7E8DF1C: TProofProgressDialog::TProofProgressDialog(TProof*, char const*, int, long long, long long) (TProofProgressDialog.cxx:142)
==14509==    by 0x7EE63EE: G__G__SessionViewer_124_0_2(G__value*, char const*, G__param*, int) (in /panfs/panfs.ihep.ac.cn/home/data/eugenyboger/new/dbg/root/lib/libSessionViewer.so)
==14509==    by 0x4856205: Cint::G__CallFunc::Execute(void*) (CallFunc.cxx:440)
==14509==    by 0x4288C8A: Cint::G__CallFunc::ExecInt(void*) (CallFunc.h:98)
==14509==    by 0x4286216: TCint::CallFunc_ExecInt(void*, void*) const (TCint.cxx:2415)
==14509==    by 0x42B60DA: TMethodCall::Execute(void*, long&) (TMethodCall.cxx:375)
==14509==    by 0x41EE9A6: TMethodCall::Execute(long&) (TMethodCall.h:112)
==14509==    by 0x41EC535: TPluginHandler::ExecPlugin(int, ...) (TPluginManager.cxx:316)
==14509==    by 0x645C52D: TProof::HandleInputMessage(TSlave*, TMessage*, bool) (TProof.cxx:3147)
==14509== 
==14509== 1 errors in context 3 of 3:
==14509== Syscall param socketcall.send(msg) points to uninitialised byte(s)
==14509==    at 0xD52A58: send (in /lib/tls/libpthread-2.3.4.so)
==14509==    by 0x79F13CE: XrdClientSock::SendRaw(void const*, int, int) (XrdClientSock.cc:310)
==14509==    by 0x7A13CB7: XrdClientPhyConnection::WriteRaw(void const*, int, int) (XrdClientPhyConnection.cc:623)
==14509==    by 0x7A0DE8A: XrdClientLogConnection::WriteRaw(void const*, int, int) (XrdClientLogConnection.cc:58)
==14509==    by 0x7A07EB5: XrdClientConnectionMgr::WriteRaw(int, void const*, int, int) (XrdClientConnMgr.cc:595)
==14509==    by 0x7990DB3: XrdProofConn::WriteRaw(void const*, int) (XrdProofConn.cxx:892)
==14509==    by 0x798FCF4: XrdProofConn::LowWrite(XPClientRequest*, void const*, int) (XrdProofConn.cxx:756)
==14509==    by 0x798E863: XrdProofConn::SendRecv(XPClientRequest*, void const*, char**) (XrdProofConn.cxx:512)
==14509==    by 0x798F37C: XrdProofConn::SendReq(XPClientRequest*, void const*, char**, char const*, bool) (XrdProofConn.cxx:627)
==14509==    by 0x7983536: TXSocket::SendRaw(void const*, int, ESendRecvOptions) (TXSocket.cxx:1193)
==14509==    by 0x7984D6D: TXSocket::Send(TMessage const&) (TXSocket.cxx:1690)
==14509==    by 0x6458209: TProof::Broadcast(TMessage const&, TList*) (TProof.cxx:2167)
==14509==  Address 0xA736180 is 1,656 bytes inside a block of size 2,072 alloc'd
==14509==    at 0x4004BBE: operator new[](unsigned) (vg_replace_malloc.c:197)
==14509==    by 0x42125FC: TStorage::ReAllocChar(char*, unsigned, unsigned) (TStorage.cxx:272)
==14509==    by 0x41BDD67: TBuffer::Expand(int) (TBuffer.cxx:189)
==14509==    by 0x4E1C231: TBufferFile::WriteUInt(unsigned) (TBufferFile.h:365)
==14509==    by 0x41C7C86: operator<<(TBuffer&, unsigned) (TBuffer.h:343)
==14509==    by 0x4E171FB: TBufferFile::WriteObjectClass(void const*, TClass const*) (TBufferFile.cxx:2362)
==14509==    by 0x4E1741E: TBufferFile::WriteObjectAny(void const*, TClass const*) (TBufferFile.cxx:2446)
==14509==    by 0x4E16A2C: TBufferFile::WriteFastArray(void**, TClass const*, int, bool, TMemberStreamer*) (TBufferFile.cxx:2164)
==14509==    by 0x4F091E8: int TStreamerInfo::WriteBufferAux<char**>(TBuffer&, char** const&, int, int, int, int) (TStreamerInfoWriteBuffer.cxx:464)
==14509==    by 0x4E1A1E1: TBufferFile::WriteClassBuffer(TClass const*, void*) (TBufferFile.cxx:3584)
==14509==    by 0x6443CE7: TDSetElement::Streamer(TBuffer&) (TDSet.cxx:1733)
==14509==    by 0x429EA35: TClass::StreamerTObjectInitialized(void*, TBuffer&, TClass const*) const (TClass.cxx:4963)
--14509-- 
--14509-- supp:   67 Ubuntu-stripped-ld.so
--14509-- supp:   15 dl_relocate_object
==14509== 
==14509== IN SUMMARY: 3 errors from 3 contexts (suppressed: 82 from 2)
==14509== 
==14509== malloc/free: in use at exit: 7,207,486 bytes in 82,573 blocks.
==14509== malloc/free: 1,080,470 allocs, 997,897 frees, 51,371,621 bytes allocated.
==14509== 
==14509== searching for pointers to 82,573 not-freed blocks.
==14509== checked 32,443,260 bytes.
==14509== 
==14509== LEAK SUMMARY:
==14509==    definitely lost: 1,007 bytes in 25 blocks.
==14509==      possibly lost: 377,304 bytes in 7,938 blocks.
==14509==    still reachable: 6,829,175 bytes in 74,610 blocks.
==14509==         suppressed: 0 bytes in 0 blocks.
==14509== Use --leak-check=full to see details of leaked memory.
--14509--  memcheck: sanity checks: 6586 cheap, 264 expensive
--14509--  memcheck: auxmaps: 0 auxmap entries (0k, 0M) in use
--14509--  memcheck: auxmaps: 0 searches, 0 comparisons
--14509--  memcheck: secondaries: 540 issued (34560k, 33M)
--14509--  memcheck: secondaries: 1049 accessible and distinguished (67136k, 65M)
--14509--     tt/tc: 1,896,258 tt lookups requiring 26,042,695 probes
--14509--     tt/tc: 1,896,257 fast-cache updates, 8 flushes
--14509-- translate: new        132,976 (3,248,977 -> 50,669,061; ratio 155:10) [0 scs]
--14509-- translate: dumped     0 (0 -> ??)
--14509-- translate: discarded  871 (17,909 -> ??)
--14509-- scheduler: 328,074,908 jumps (bb entries).
--14509-- scheduler: 6,586/3,859,942 major/minor sched events.
--14509--    sanity: 6587 cheap, 264 expensive checks.
--14509--    exectx: 30,011 lists, 66,695 contexts (avg 2 per list)
--14509--    exectx: 2,078,451 searches, 2,143,723 full compares (1,031 per 1000)
--14509--    exectx: 0 cmp2, 194 cmp4, 0 cmpAll

Ok, I see.
The ‘notOnClient’ option was not working on PROOF-Lite.
I have fixed the problem in the trunk and ported the fix back to 5-28-00-patches, so that it will appear in 5-28-00a in the coming days.
With this fix it should just work as for the remote PROOF cluster.

For the other problem, it looks like there is something undefined in your setup.
In you just start a shell in normal non-batch mode, i.e. the mode giving the problem, what do you get from this?

$ root
root [0] gClient
root [1] gClient->GetRoot()   // if gClient is != from 0

G. Ganis

Hello,

  1. I have the following output:
root [1] gClient
Error: Symbol gClient is not defined in current scope  (tmpfile):1:
*** Interpreter error recovered ***
  1. Thank you. This is definitely more correct behaviour (sadly, my yesterday invented workaround no longer works)
    I must be describing my case very obscure. So, I’ll try again. The code tree of our project looks as following:
BeanCore/     
BeanUser/  
main.cxx

BeanCore hosts “core” code, such as a selector class and BeanUser stands for user analysis.

The “make” in BeanCore directory compiles all the contents to libBean.so. So do “make” in BeanUser directory.

As our program could run as plain C++ application, there is also main.cxx source file. The top-level “make” compiles main.cxx to bean.exe executable and links it agains libBean.so and libUser.so.

After running “make”, the user will have bean.exe binary which is dynamically linked against two libraries libBean.so and libUser.so.

For our program to support PROOF in transparent way, we created the packages from BeanCore and BeanUser directories. So, the contents of such a directories are compressed into BeanCore.par and BeanUser.par respectevely.

The BUILD.sh of each package consists of single ‘make’ command, while the SETUP.C invokes gSystem->Load() with the corresponding shared library name.

When the program works in PROOF-Lite mode, the packages are copied to ~/.proof/packages. Then every package is compiled and everything is o.k. The thing I’m trying to avoid is such a compilation, because actualy the compilation will hapen twice for each change in sources: first time to support standalone no-PROOF-at-all mode, and the second time for PROOF-Lite to work.

Of course we could include binaries (libUser.so and libBean.so) to corresponding PAR packages. But then we should be able know: either we should rebuild the package (remote PROOF case) or do not rebuild and use the binaries provided (PROOF-Lite case).
And it seems like there is now way to distinguish between proof-lite worker process and remote PROOF worker process from SETUP.C or BUILD.sh.

Now I personally think that the best solution is to make two separate versions of packages, say BeanCore.par with source code only and BeanCore.bin.par with binaries only included. And then select right package to upload and enable at the runtime: *.bin.par packages for PROOF-Lite solely while *.par for remote PROOF.

What would you suggest we do?

Hi,

Right. I have just added this in the trunk and 5-28-00-patches. The env variables ROOTPROOFCLIENT and ROOTPROOFLITE are set when appropriate. See the modified versions of tutorials/proof/event.par in those version about the way to check and use them.

For the crash, sorry, you need to start some graphic stuff before:

root [0] TProof::LogViewer("")
root [1] gClient
root [2] gClient->GetRoot()   // if gClient is != from 0

otherwise the plug-in loading gClient is not loaded. Can you check what you get in this case?

G. Ganis

Hi,

Thanks, this is exactly what we need!

As for gClient, I have the following output:

root [0] TProof::LogViewer("")
Info in <TProofMgrLite::GetSessionLogs>: analysing session dir /afs/ihep.ac.cn/users/e/eugenyboger/.proof/panfs-panfs.ihep.ac.cn-home-data-eugenyboger-new-bean/session-lxslc09.ihep.ac.cn-1298014864-11327
root [1] gClient
(class TGClient*)0xa09cb90
root [2] gClient->GetRoot() 
(const class TGWindow*)0xa0b6710

Hi,

Your gClient looks normal, so I have no idea why you get the crash. The immediate explanation is that something gets corrupted, which is worrying.

There was a missing initialization, which we fixed in the trunk and 5-28-00-patches. I doubt it will have any effect, but you can always try if you are in the position to do that. Let me know if you do it.

G. Ganis

Hi,

I’ve just updated and compiled trunk and there is still no success.

I’ve traced program by gdb and just before the crash gClient and fParent are seem to be NULL.

Hi,

Uhm … this sounds more as problem with the X interface, so it should probably be moved to ROOT support.
As a last thing can you try the following?

root [0] TProof *p = TProof::Open("")
...
root [1] new TProofProgressDialog(p,"MySelector",10,0,10000)

This should just open the progress dialog in standby mode.
If this crashes then the experts may have a way to steer your debugging …

G. Ganis