Error while processing dset - root 5.34/25

Hello,

I moved my working analysis code (in attachement) using proof from one server to another, load the same root version 5.34/25, but have the following runtime error.
Do you have any suggestion? How can I make sure proof is correctly set on the new server?

Thanks in advance,
Anna



Validating files: OK (1 files)
0.0: caught exception triggered by signal ‘1’ while processing dset:‘TDSet:phys’, file:’/home/usr201/acorsi/s018_analysis/macros_v1/…/rootfiles/physics_v8.24/phys0200.root’ - check logs for possible stacktrace - last event: 0
Info in TProofLite::MarkBad:
+++ Message from master at iclust : marking iclust:-1 (0.0) as bad
+++ Reason: undefined message in TProof::CollectInputFrom(…)

+++ Message from master at iclust : marking iclust:-1 (0.0) as bad
+++ Reason: undefined message in TProof::CollectInputFrom(…)

+++ Most likely your code crashed
+++ Please check the session logs for error messages either using
+++ the ‘Show logs’ button or executing
+++
+++ root [] TProof::Mgr(“iclust”)->GetSessionLogs()->Display("*")

[TProof:] Total 619369 events |>…| 0.00 %
*** Break *** segmentation violation

===========================================================
There was a crash.
This is the entire stack trace of all threads:

#0 0x0000003def4ac65e in waitpid () from /lib64/libc.so.6
#1 0x0000003def43e609 in do_system () from /lib64/libc.so.6
#2 0x00007f0fcf67cb9f in TUnixSystem::StackTrace() () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libCore.so
#3 0x00007f0fcf67e7dc in TUnixSystem::DispatchSignals(ESignals) () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libCore.so
#4
#5 0x00007f0fc9f67b05 in proof::Terminate() () from /home/usr201/acorsi/s018_analysis/macros_v1/./proof_C.so
#6 0x00007f0fcb33b780 in TProofPlayerLite::Finalize(bool, bool) () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libProofPlayer.so
#7 0x00007f0fcb33cfe8 in TProofPlayerLite::Process(TDSet*, char const*, char const*, long long, long long) () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libProofPlayer.so
#8 0x00007f0fcc84606e in TProofLite::Process(TDSet*, char const*, char const*, long long, long long) () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libProof.so
#9 0x00007f0fcccbb52d in G__G__Tree_130_0_161(G__value*, char const*, G__param*, int) () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libTree.so
#10 0x00007f0fce39ccee in Cint::G__ExceptionWrapper(int ()(G__value, char const*, G__param*, int), G__value*, char*, G__param*, int) () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libCint.so
#11 0x00007f0fce4407d7 in G__execute_call () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libCint.so
#12 0x00007f0fce440b5d in G__call_cppfunc () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libCint.so
#13 0x00007f0fce420378 in G__interpret_func () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libCint.so
#14 0x00007f0fce40d7c7 in G__getfunction () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libCint.so
#15 0x00007f0fce4f170c in G__getstructmem(int, G__FastAllocString&, char*, int, char*, int*, G__var_array*, int) () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libCint.so
#16 0x00007f0fce4e9bcf in G__getvariable () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libCint.so
#17 0x00007f0fce3e77b9 in G__getitem () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libCint.so
#18 0x00007f0fce3ee05c in G__getexpr () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libCint.so
#19 0x00007f0fce46f08c in G__exec_statement () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libCint.so
#20 0x00007f0fce41edb9 in G__interpret_func () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libCint.so
#21 0x00007f0fce40d825 in G__getfunction () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libCint.so
#22 0x00007f0fce3e7cd6 in G__getitem () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libCint.so
#23 0x00007f0fce3ee05c in G__getexpr () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libCint.so
#24 0x00007f0fce3f8bd8 in G__calc_internal () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libCint.so
#25 0x00007f0fce47c2c7 in G__process_cmd () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libCint.so
#26 0x00007f0fcf640e00 in TCint::ProcessLine(char const*, TInterpreter::EErrorCode*) () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libCore.so
#27 0x00007f0fcf63d49b in TCint::ProcessLineSynch(char const*, TInterpreter::EErrorCode*) () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libCore.so
#28 0x00007f0fcf5ab55c in TApplication::ExecuteFile(char const*, int*, bool) () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libCore.so
#29 0x00007f0fcf5aaac0 in TApplication::ProcessLine(char const*, bool, int*) () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libCore.so
#30 0x00007f0fcf200802 in TRint::Run(bool) () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libRint.so
#31 0x00000000004011ec in main ()

The lines below might hint at the cause of the crash.
If they do not help you then please submit a bug report at
root.cern.ch/bugs. Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.

#5 0x00007f0fc9f67b05 in proof::Terminate() () from /home/usr201/acorsi/s018_analysis/macros_v1/./proof_C.so
#6 0x00007f0fcb33b780 in TProofPlayerLite::Finalize(bool, bool) () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libProofPlayer.so
#7 0x00007f0fcb33cfe8 in TProofPlayerLite::Process(TDSet*, char const*, char const*, long long, long long) () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libProofPlayer.so
#8 0x00007f0fcc84606e in TProofLite::Process(TDSet*, char const*, char const*, long long, long long) () from /home/gpfs/manip/mnt0607/structurenucleaire/ysun/root_v5.34.25/lib/libProof.so

macros_v1.zip (53.7 KB)

Hi,

FYI - Gerri, our Proof expert, is looking into this.

Cheers, Axel.

Dear Anna,

I see the the workers try to access the tree files under
/home/usr201/acorsi/s018_analysis/macros_v1/…

Are these accessible from the new cluster / server?

After a crash, can you try to get the log files with:

TProofLog *pl = TProof::Mgr("lite://")->GetSessionLogs();
pl->Save("*", "filewithlogs.txt");

and post filewithlogs.txt ?

G Ganis

Hello,

Yes, the file /home/usr201/mnt/acorsi/myspace/s018/rootfiles/physics_v8.24_1/phys0200.root is accessible. I attache the log message.

Cheers,
Anna
filewithlogs.txt (30.2 KB)

Dear Anna,

Thanks for the logs.
So, it seems there is something going on when reading the files.

Some info that could shed some light:

 1. You wrote that it was working with PROOF on another machine: could you give details about the operating systems and versions of the old server (where it was working) and new server (where it crashes);

 2. Forgot to specify before, can you rerun compiling the selector in debug mode, i.e. using 
ch.Process("proof.C++g");
    in makeproofchain.C, and save the logs again? This may give a bit more information.

It would definitely help if you could make available a few data files so to allow me to run the code.

G Ganis

Hello,

New server: Linux iclust 2.6.32-642.4.2.el6.x86_64 #1 SMP Tue Aug 23 11:15:56 CDT 2016 x86_64 x86_64 x86_64 GNU/Linux

Old server: Linux ribfana02.riken.jp 2.6.32-642.6.2.el6.x86_64 #1 SMP Tue Oct 25 15:06:33 CDT 2016 x86_64 x86_64 x86_64 GNU/Linux

In both cases, I run root5.34.25.

I attach the log and upload here a file: dropbox.com/s/yb8313733xdn1 … .root?dl=0
Thank for your help!
Anna
filewithlogs.txt (29 KB)

Dear Anna,

Thanks for the files.
I have tried to reproduce your problem on my Ubuntu 16.04 machine without success.
I’ll give a try on a SLC6 machine asap.
Attached you’ll find a version of proof.{C,h} suppressing some warnings that you might have got.

Also it shows how to run in valgrind (uncomment the corresponding line in makeproofchain.C).
If you could run it in your case it may give some hints on the origin of the problem. For this you need
valgrind installed on the machine. The output files are in the sandbox at $HOME/.proof/<your_working_dir>/last-lite-session/worker-0.0.valgrind.log, etc .
(<your_working_dir> is your current workign dir with ‘/’ replaced by ‘_’).
Note that running with the ‘valgrind’ option will automatically set the number of workers to 2 and make take very long.

G Ganis
macros_v2.tgz (10.2 KB)

Hello,

Apparently this modification to proof.C solved the problem:

using namespace std;
proof2::proof2(TTree * /tree/): fChain(0)
{
// The Begin() function is called at the start of the query.
// When running with PROOF Begin() is only called on the client.
// The tree argument is deprecated (on PROOF 0 is passed).

myhisto=0;

}

Thanks!
Anna

Dear Anna,

Good to know, thanks for the feedback.

G Ganis