Random Crashes with PROOF?

Hi,

After finally getting my TSelector working properly with PROOF I’m running into some strange errors. My code seems to work great most of the time, but occasionally (and always when running on large chains) I get the following error and the logs only show a seg fault with no explanation.

Info in <TProofLite::MarkBad>: +++ Message from master at Darwin : marking 0.0-Darwin-1252752655-22839:-1 (0.0) as bad +++ Reason: undefined message in TProof::CollectInputFrom(...)

I’m also getting seg faults all of the time when exiting root after running my PROOF code- but I’m not sure that’s related. Is anyway to solve this? How can I see where the error is coming from, and why is the error so random?

Thanks,
Nati.

Could you please:

  1. Specify the ROOT version that your are using;

  2. Say a bit more about your output (how big?)

  1. Post the traceback of the seg fault

?

G. Ganis

Hi ganis,

I’m using root 5.22 on a gcc 4.4 64bit ubuntu system.
the output isn’t big at all - one histogram and a one printf().
it crashes on large inputs ~ around 0.5 GB.

the only trace I get is:

(no debugging symbols found)...done.
(no debugging symbols found)...done.
0x00007f459193ea8e in waitpid () from /lib/libc.so.6
error detected on stdin

Dear Nati,

I cannot say much following this error messages that you have.

At this point to try to help you I need to know better your setup and possibly have a look at your selector.

From the first report it seems that you are using PROOF-Lite, which was first released in 5.22 . Could you confirm that? How many cores? If not, could you give more details about your PROOF cluster?

Then, could post your selector and describe the input (a TChain? how many files? Read from where: local disk? remote server?)

G. Ganis

Hi,

I am working with PROOF-lite on my dual-core laptop.

after playing around a bit, I think it may have to do with background processes on my laptop “stealing” one of the cores temporarily.

I read from a local disk, build a TChain (with 2 to 20 files) and then run chain->Process.

Right now I just run it again until it doesn’t crash any more.

attached is my TSelector as the well as the script that calls it
GetEvents.C (6.08 KB)
T.h (3.65 KB)
T.C (4.36 KB)