Issues with PROOF and ToyMCSampler


_ROOT Version: 6.12.06
_Platform: x86_64-slc6-gcc62-opt
_Compiler: gcc62


Dear experts,

I am trying to use the Frequentist Calculator in RooStats for an upper limit calculation. In order to do this, the code generates toys using ToyMCSampler, and I wish to use PROOF to parallelize this process. I have never used PROOF before.

The code I run is essentially the StandardHypoTestInvDemo.C script in the $ROOTSYS/tutorials/roostats folder, with the useProof option set to true. When I run this with the default inputs of the tutorial script, the execution finishes successfully. But when I try to pass it my own workspace, the code crashes when it tries to initialize PROOF to generate toys, with the following errors:

 +++ Starting PROOF-Lite with 8 workers +++
Opening connections to workers: OK (8 workers)
Setting up worker servers: OK (8 workers)
PROOF set to parallel mode (8 workers)
[#0] PROGRESS:Generation -- RooStudyManager::runProof() sending work package to PROOF servers
[#0] PROGRESS:Generation -- RooStudyManager::runProof() starting PROOF processing of 8 experiments

Info in <TProofLite::SetQueryRunning>: starting query: 1
Info in <TProofQueryResult::SetRunning>: nwrks: 8
0.3: caught exception triggered by signal '1' <undef> -1
0.6: caught exception triggered by signal '1' <undef> -1
0.0: caught exception triggered by signal '1' <undef> -1
Info in <TProofLite::MarkBad>:
 +++ Message from master at lhcb-dev.phy.syr.edu : marking lhcb-dev.phy.syr.edu:-1 (0.0) as bad
 +++ Reason: undefined message in TProof::CollectInputFrom(...)

 +++ Message from master at lhcb-dev.phy.syr.edu : marking lhcb-dev.phy.syr.edu:-1 (0.0) as bad
 +++ Reason: undefined message in TProof::CollectInputFrom(...)

 +++ Most likely your code crashed
 +++ Please check the session logs for error messages either using
 +++ the 'Show logs' button or executing
 +++
 +++ root [] TProof::Mgr("lhcb-dev.phy.syr.edu")->GetSessionLogs()->Display("*")

The above message is displayed a number of times, before the code segfaults.

My first question is, how do I access the session logs from PROOF? The code only uses the ProofConfig object, which doesn’t seem to have direct access to TProof. I tried executing the command it shows me on a ROOT terminal, but I only get another error:

190614 12:38:12 4162 Proofx-E: Conn::Connect: failed to connect to proof://lhcb-dev.phy.syr.edu:1093//
190614 12:38:12 4162 Proofx-E: XrdProofConn: XrdProofConn: severe error occurred while opening a connection to server [lhcb-dev.phy.syr.edu:1093]
Warning in <TXProofMgr::GetSessionLogs>: invalid TXProofMgr - do nothing

Thread 2 (Thread 0x7f063dbd4700 (LWP 6426)):
#0  0x00000032c88ac9fd in nanosleep () from /lib64/libc.so.6
#1  0x00000032c88ac870 in sleep () from /lib64/libc.so.6
#2  0x00007f063e76ffec in GarbageCollectorThread (arg=0x8bfc220, thr=<optimized out>) at /mnt/build/jenkins/workspace/lcg_release_tar/BUILDTYPE/Release/COMPILER/gcc62binutils/LABEL/slc6/build/externals/xrootd-4.8.2/src/xrootd/4.8.2/src/XrdClient/XrdClientConnMgr.cc:93
#3  0x00007f063e9f86a8 in XrdSysThread_Xeq (myargs=0x83cecf0) at /mnt/build/jenkins/workspace/lcg_release_tar/BUILDTYPE/Release/COMPILER/gcc62binutils/LABEL/slc6/build/externals/xrootd-4.8.2/src/xrootd/4.8.2/src/XrdSys/XrdSysPthread.cc:86
#4  0x00000032c90079d1 in start_thread () from /lib64/libpthread.so.0
#5  0x00000032c88e886d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f064f4a5960 (LWP 4162)):
#0  0x00000032c88ac61d in waitpid () from /lib64/libc.so.6
#1  0x00000032c883e619 in do_system () from /lib64/libc.so.6
#2  0x00000032c883e950 in system () from /lib64/libc.so.6
#3  0x00007f064ff98568 in TUnixSystem::StackTrace() () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.12.06-0f687/x86_64-slc6-gcc62-opt/lib/libCore.so
#4  0x00007f064c189e05 in cling::MultiplexInterpreterCallbacks::PrintStackTrace() () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.12.06-0f687/x86_64-slc6-gcc62-opt/lib/libCling.so
#5  0x00007f064c1898bb in cling_runtime_internal_throwIfInvalidPointer () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.12.06-0f687/x86_64-slc6-gcc62-opt/lib/libCling.so
#6  0x00007f063eeb3132 in ?? ()
#7  0x0000000009037b50 in ?? ()
#8  0x0000000001474ce0 in ?? ()
#9  0x000000000858f140 in ?? ()
#10 0x00007f064c189870 in ?? () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.12.06-0f687/x86_64-slc6-gcc62-opt/lib/libCling.so
#11 0x000000003eeb3000 in ?? ()
#12 0x0000000000000000 in ?? ()
Error in <HandleInterpreterException>: Trying to dereference null pointer or trying to call routine taking non-null arguments.
Execution of your code was aborted.
ROOT_prompt_1:1:1: warning: null passed to a callee that requires a non-null argument [-Wnonnull]
TProof::Mgr("lhcb-dev.phy.syr.edu")->GetSessionLogs()->Display("*")

My second question is this. I suspect that the problem is caused by a custom fit shape obtained from a .cpp file, RooHypatia2.cpp. Uptil now, when I wasn’t using PROOF, I could simply compile the cpp file and add the following line to the StdHypoTestInvDemo.C code

gSystem->Load("RooHypatia2_cpp.so");

and everything would be okay. Is is possible that PROOF is crashing because it doesn’t have access to this custom class somehow? If so, how do I pass this custom fit shape to PROOF?

It is important for me to get PROOF working for the toy generation, because without PROOF, the code takes days to run.

I am attaching the ROOT file containing my workspace (myWS.root), the script I run (StandardHypoTestInvDemo.C), the .cpp file with the custom fit shape (RooHypatia2.cpp) and the full log of what happens when I run the script (fullLog.txt). The command I use to execute the script is:

root -l 'StandardHypoTestInvDemo.C("myWS.root","w","ModelConfig","bkgOnlyModel","combData",0,3,true,10,0,200,100,false,0)'

I would highly appreciate any advice on this issue.
myWS.root (649.2 KB)
StandardHypoTestInvDemo.C (43.1 KB)
RooHypatia2.cpp (6.0 KB)
fullLog.txt (36.6 KB)
RooHypatia2.h (1.9 KB)

Thanks in advance,
Arvind.

@ganis could you help for the proof part?
@StephanH could you have a look at the roostats part?

Hello Arvind,

That’s of course possible. I have no experience with PROOF, though.

Can I just ask the obvious first: Does it run without PROOF?

Further, you didn’t attach the header RooHypatia2.h. Could you maybe do that?

Hello Stephan,

Yes. It just takes a lot longer. This is why I wish to use PROOF to parallelize the toy generation.

I just tried it again after attaching the header, but the code still crashes.

It sounds like a PROOF issue, so I’m unsure whether I can help. But what I meant by “attaching the header” was to attach it to this forum post. Otherwise, we cannot run your example. :slight_smile:

Dear Arvind,

They are under $HOME/.proof/<a-path-that-resembles-your-working-dir>

Yes

Try

  gProof->Load("<full-path-to>/RooHypatia2_cpp.so");

This said, given that you are starting with PROOF, which is now in legacy mode, and that you are using a reasonably recent version of ROOT, I would consider moving to TProcessExecutor (successor of PROOF-Lite) or TThreadExecutor.
Have a look under tutorials/multicore.
Or perhaps even RDataFrame. Perhaps Stefan can help here.

G Ganis

Apologies. The header file is now attached to my original post.

Thanks Ganis. I will try your advice with PROOF first. Then, if I still don’t have luck, I will try TProcessExecutor or TThreadExecutor.

Arvind.

Dear @ganis,

Indeed it turned out, from reading the PROOF logs, that the original problem was that it couldn’t access the custom RooHypatia class. When I added the gPROOF->Load line as per your suggestion, I get a different error now

#0  0x00000032c88ac5de in waitpid () from /lib64/libc.so.6
#1  0x00000032c883e619 in do_system () from /lib64/libc.so.6
#2  0x00007f4fa4f8a568 in TUnixSystem::StackTrace() () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.12.06-0f687/x86_64-slc6-gcc62-opt/lib/libCore.so
#3  0x00007f4fa117be05 in cling::MultiplexInterpreterCallbacks::PrintStackTrace() () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.12.06-0f687/x86_64-slc6-gcc62-opt/lib/libCling.so
#4  0x00007f4fa117b8bb in cling_runtime_internal_throwIfInvalidPointer () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.12.06-0f687/x86_64-slc6-gcc62-opt/lib/libCling.so
#5  0x00007f4fa0caa3ea in ?? ()
#6  0x00007fff72f27190 in ?? ()
#7  0x00007fff72f27058 in ?? ()
#8  0x00007fff72f26f20 in ?? ()
#9  0x00007fff72f26de8 in ?? ()
#10 0x00007fff72f23840 in ?? ()
#11 0x00007fff72f23930 in ?? ()
#12 0x0000000000000000 in ?? ()
Error in <HandleInterpreterException>: Trying to dereference null pointer or trying to call routine taking non-null arguments.
Execution of your code was aborted.
In file included from input_line_10:1:
/data1/avenkate/JpsiLambda_RESTART/scripts/testdir/StandardHypoTestInvDemo.C:971:4: warning: null passed to a callee that requires a non-null argument [-Wnonnull]
   gProof->Load("/data1/avenkate/JpsiLambda_RESTART/scripts/testdir/RooHypatia2_cpp.so");

This seems to suggest that the gProof object is a nullptr. I even tried including the TProof class, but it didn’t help. I am attaching my modified script, and full log file. I would appreciate any advice here.

And the reason I started with PROOF and not TProcessExecutor or TThreadExecutor, was because the tutorial itself had an option to use PROOF. Thats all :slight_smile:

fullLog_new.txt (18.9 KB)
StandardHypoTestInvDemo.C (43.3 KB)
worker-0.0.txt (11.3 KB)

Thanks for the help,
Arvind.

PS: I have also attached one of the original PROOF logs (showing that it didn’t have access to the custom class), as worker-0.0.txt.
PPS: Editing this post since I can’t make another. Dear @ganis, is there some solution to the problem I have mentioned in the above post?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.