SysError when starting PROOF

Hi Rooters,

I have been able to use proof successfully for a while, but I needed to change my xpd.workdir.  After I did, it took a little finaggling to get everything working again, but now I get an error when starting proof, although it seems to work alright.  Here's the error:
root [0] proof = TProof::Open("localhost")
Starting master: opening connection ...
Starting master: OK                                                 
Opening connections to workers: OK (8 workers)                 
Setting up worker servers: OK (8 workers)                 
19:53:46 15967 Mst-0 | SysError in <TNamed::Lock>: error locking /crn/scratch/msnowball/tmp/proof-query-lock-gainesville-1268528025-15967-%crn%scratch%msnowball%tmp%xpd%msnowball%queries%session-gainesville-1268528025-15967 (Function not implemented)
PROOF set to parallel mode (8 workers)
(class TProof*)0x1ff633a0

/crn/scratch/msnowball/tmp is the old workingdir. In the new workingdir, everything seems to be fine. I tried hard and soft resets but nothing eliminates this error. Any help on eliminating this would be very much appreciated. Attached is my xpd.log. Its like its still looking in the old directory for the .sessions file.

Thanks for any help!

Matt
xpdlog.txt (9.44 KB)

Dear Matt,

Can you post the configuration file?
What is the file system on /crn/scratch/msnowball ?
It looks like it does not support file locking.

G. Ganis

Hi Ganis,

Attached is my config file. Filesystem type is lustre on /crn/scratch.  It worked OK before on this same filesystem just a different dir in my scratch dir.  Thanks.

Best,
Matt
xpd.cf.txt (298 Bytes)

[quote=“msnowball”]Hi Ganis,

Attached is my config file. Filesystem type is lustre on /crn/scratch.  It worked OK before on this same filesystem just a different dir in my scratch dir.  Thanks.

Best,
Matt[/quote]
I think on the lustre the file locking is not enabled by default on each mount… It is actually a mount option. It could be a case, that on your client lustre is mounted without the locking enabled.
On the other hand, it could be, that an old proof (proofserv.exe) process is still hanging, which prevents new server to lock the file.

BTW, Gerri can correct me, but I beleave that the call which fails is the lockf, am I right?
So, you could write a very small test program to try to lock a file. If it doesn’t work, the next step would be to call admins :wink:

Hi,

Yes, the failing call is lockf . The files used for locking are created under the declared TMP, which should support locking.
You can give a try by just commenting out the ‘xpd.tmp’ directive.

Btw, the log posted on the first report is consistent, as far as the working dir settings are concerned, with the config file:

I really think that the problem is due to file locking on lustre .
Note also that the UNIX sockets are created in the admin area, which by default is under TMP, and I do not know how UNIX sockets behave on Lustre.
I suggest that you use an area local to the machines for TMP and admin (all.adminpat /adminpath): there is not much space used there, it should always be possible to have them local.

Gerri

Ok, thanks for the help guys!

Matt