Proof-lite node crash

Hi
I am looking for hints what might be wrong here. There is a selector code, that read a set of trees
w/o problems when not using proof, thus I conclude the trees and code are ok. The crash message is from some basic proof-lite code.
I realize the issue cannot be reproduced from the information available here, but getting some hints what to look for could be useful.
Thanks in advance.

The interactive session points to getting events from a file failing (on the firt event in a specific file)
The full tree has been setup as a chain of ~16 trees distributed on 8 nodes.

It is conceivable that the problem is related to the generation of the trees,since reading older sets created by root 5.34.22 (or thereabout) works fine. Again though inspecting trees by a browser, and reading standalone reveals no issue.

– errors on host

The version being used is 5.34.2
Validating files: OK (16 files)
0.3: caught exception triggered by signal ‘1’ while processing dset:‘TDSet:T1’, file:’/data/data/pp/200/dst015205v3p5.root’ - check logs for possible stacktrace - last event: 0
Info in TProofLite::MarkBad:
+++ Message from master at rcas0006.rcf.bnl.gov : marking rcas0006.rcf.bnl.gov:-1 (0.3) as bad
+++ Reason: undefined message in TProof::CollectInputFrom(…)

+++ Message from master at rcas0006.rcf.bnl.gov : marking rcas0006.rcf.bnl.gov:-1 (0.3) as bad
+++ Reason: undefined message in TProof::CollectInputFrom(…)

+++ Most likely your code crashed
+++ Please check the session logs for error messages either using
+++ the ‘Show logs’ button or executing
+++
+++ root [] TProof::Mgr(“rcas0006.rcf.bnl.gov”)->GetSessionLogs()->Display("*")

– inspection of log

.19:12:32 15895 Wrk-0.3 | *** Break ***: segmentation violation

There was a crash.
This is the entire stack trace of all threads:

#0 0x00007f4e9fe605de in waitpid () from /lib64/libc.so.6
#1 0x00007f4e9fdf2619 in do_system () from /lib64/libc.so.6
#2 0x00007f4ea0ae70a8 in TUnixSystem::StackTrace() () from /opt/brahms/pro/lib/libCore.so.5.34
#3 0x00007f4ea0ae6533 in TUnixSystem::DispatchSignals(ESignals) () from /opt/brahms/pro/lib/libCore.so.5.34
#4
#5 0x00007f4e9c340665 in TTreeCache::FillBuffer() () from /opt/brahms/pro/lib/libTree.so
#6 0x00007f4e9c33ff0a in TTreeCache::ReadBufferNormal(char*, long long, int) () from /opt/brahms/pro/lib/libTree.so
#7 0x00007f4e9c2f79ff in TBasket::ReadBasketBuffers(long long, int, TFile*) () from /opt/brahms/pro/lib/libTree.so
#8 0x00007f4e9c2fe90c in TBranch::GetBasket(int) () from /opt/brahms/pro/lib/libTree.so
#9 0x00007f4e9c2fef8e in TBranch::GetEntry(long long, int) () from /opt/brahms/pro/lib/libTree.so
#10 0x00007f4e9c309830 in TBranchElement::GetEntry(long long, int) () from /opt/brahms/pro/lib/libTree.so
#11 0x00007f4e91c3e43b in fsSpecSelector::Process(long long) () from /direct/brahms+u/videbaek/.proof/direct-brahms+u-videbaek-brahms_app-fv_app-analysis-pp05/session-rcas0006.rcf.bnl.gov-1416269486-15873/worker-0.3/./fsSpecSelector_C.so
#12 0x00007f4e922a23f0 in TProofPlayer::Process(TDSet*, char const*, char const*, long long, long long) () from /opt/brahms/pro/lib/libProofPlayer.so
#13 0x00007f4e9bf30099 in TProofServ::HandleProcess(TMessage*, TString*) () from /opt/brahms/pro/lib/libProof.so
#14 0x00007f4e9bf38d83 in TProofServ::HandleSocketInput(TMessage*, bool) () from /opt/brahms/pro/lib/libProof.so
#15 0x00007f4e9bf29408 in TProofServ::HandleSocketInput() () from /opt/brahms/pro/lib/libProof.so
#16 0x00007f4e9bf45851 in TProofServLiteInputHandler::Notify() () from /opt/brahms/pro/lib/libProof.so
#17 0x00007f4ea0ae478e in TUnixSystem::CheckDescriptors() () from /opt/brahms/pro/lib/libCore.so.5.34
#18 0x00007f4ea0ae4d53 in TUnixSystem::DispatchOneEvent(bool) () from /opt/brahms/pro/lib/libCore.so.5.34
#19 0x00007f4ea0a68526 in TSystem::InnerLoop() () from /opt/brahms/pro/lib/libCore.so.5.34
#20 0x00007f4ea0a69fab in TSystem::Run() () from /opt/brahms/pro/lib/libCore.so.5.34
#21 0x00007f4ea0a0d1df in TApplication::Run(bool) () from /opt/brahms/pro/lib/libCore.so.5.34
#22 0x0000000000401c71 in main ()

The lines below might hint at the cause of the crash.
If they do not help you then please submit a bug report at
root.cern.ch/bugs. Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.

#5 0x00007f4e9c340665 in TTreeCache::FillBuffer() () from /opt/brahms/pro/lib/libTree.so
#6 0x00007f4e9c33ff0a in TTreeCache::ReadBufferNormal(char*, long long, int) () from /opt/brahms/pro/lib/libTree.so
#7 0x00007f4e9c2f79ff in TBasket::ReadBasketBuffers(long long, int, TFile*) () from /opt/brahms/pro/lib/libTree.so
#8 0x00007f4e9c2fe90c in TBranch::GetBasket(int) () from /opt/brahms/pro/lib/libTree.so
#9 0x00007f4e9c2fef8e in TBranch::GetEntry(long long, int) () from /opt/brahms/pro/lib/libTree.so
#10 0x00007f4e9c309830 in TBranchElement::GetEntry(long long, int) () from /opt/brahms/pro/lib/libTree.so
#11 0x00007f4e91c3e43b in fsSpecSelector::Process(long long) () from /direct/brahms+u/videbaek/.proof/direct-brahms+u-videbaek-brahms_app-fv_app-analysis-pp05/session-rcas0006.rcf.bnl.gov-1416269486-15873/worker-0.3/./fsSpecSelector_C.so
#12 0x00007f4e922a23f0 in TProofPlayer::Process(TDSet*, char const*, char const*, long long, long long) () from /opt/brahms/pro/lib/libProofPlayer.so
#13 0x00007f4e9bf30099 in TProofServ::HandleProcess(TMessage*, TString*) () from /opt/brahms/pro/lib/libProof.so
#14 0x00007f4e9bf38d83 in TProofServ::HandleSocketInput(TMessage*, bool) () from /opt/brahms/pro/lib/libProof.so
#15 0x00007f4e9bf29408 in TProofServ::HandleSocketInput() () from /opt/brahms/pro/lib/libProof.so
#16 0x00007f4e9bf45851 in TProofServLiteInputHandler::Notify() () from /opt/brahms/pro/lib/libProof.so
#17 0x00007f4ea0ae478e in TUnixSystem::CheckDescriptors() () from /opt/brahms/pro/lib/libCore.so.5.34
#18 0x00007f4ea0ae4d53 in TUnixSystem::DispatchOneEvent(bool) () from /opt/brahms/pro/lib/libCore.so.5.34
#19 0x00007f4ea0a68526 in TSystem::InnerLoop() () from /opt/brahms/pro/lib/libCore.so.5.34
#20 0x00007f4ea0a69fab in TSystem::Run() () from /opt/brahms/pro/lib/libCore.so.5.34
#21 0x00007f4ea0a0d1df in TApplication::Run(bool) () from /opt/brahms/pro/lib/libCore.so.5.34
#22 0x0000000000401c71 in main ()

19:12:34 15895 Wrk-0.3 | Error in TProofServLite::HandleException: caugth exception triggered by signal ‘1’ while processing dset:‘TDSet:T1’, file:’/data/data/pp/200/dst015205v3p5.root’ - check logs for possible stacktrace - last event: 0

// --------- End of element log -------------------

Hello,

The problem happens in filling the tree cache.
Is the version really 5.34/02 ?

Anyhow, can you try by disabling the tree cache?

gProof->SetParameter("PROOF_UseTreeCache", 0)

before the run.

G. Ganis

Hi
Thanks for your reply.

a) the root version I used was 5.34/21 (typo).
b) I had tried your suggestion and it resulted in the same errors.
c) I am concerned the issue has to do with the generation of the tree files (~16) since reading some older previous versions of the tree files does not result in this error.These were generated with a version of root in 2010. When looking at the tree’s with the viewer no issues can be identified, whereas newly generated ones , but not all result in errors/

My question to you is what kind of diagnostic can be added that can help?

best regards

Hello,

The fact that you get the same error indicates that, for some reason, disabling the tree cache did not work.

I am still convinced that the problem is somehow with the tree cache. The fact that everything works fine in the viewer is not in contradiction, because the tree cache is not enabled by default.

The problem would be easier to debug if reproduced outside ROOT.

Could you give this try:

1. Open one file and get the relevant tree
2. Enable the cache on the tree; for example
      tree->SetCacheSize(20*1024*104)
3. Run the selector on the tree:
      tree->Process("fsSpecSelector.C")

and post the result?

G. Ganis