Cron to restart xrootd when an error comes

Hello all,

we are facing a connection problem from time to time when trying to connect to the master:

110908 09:24:11 001 Proofx-E: Conn::Connect: failed to connect to proof://
110908 09:24:11 001 Proofx-E: XrdProofConn: XrdProofConn: severe error occurred while opening a connection to server []

It seems a network connection that I am not sure why appears, but that can be solved by restarting the xrootd (/etc/init.d/xrootd restart).

When the user gets the above error, the following message appears in the log master:

110907 09:03:22 27797 xpd-I: user.3534:6@gridui01: Protocol::recycle: user user disconnected; type: ClientMaster
110907 09:03:22 27797 xpd-E: ProofServ::SendData: client not connected: csid: 0x1fe8b6e0, cid: 0, fSid: 0
110907 09:03:22 27797 xpd-E: user.3386:33@localhost.localdomain: Protocol::SendData: INT: client ID: 0, problems sending: 41 bytes to client

Currently, I have a cron that checks if the xrootd is running to restart it if not:

/etc/init.d/xrootd status
xrootd (pid 18212) is running…

I would like the cron also to check if some error (like the previous one) is showing in the log master. Is there any available way to do it?

Ana RodrĂ­guez.

Hi Ana,

We know that unfortunately there are still cases in which the daemon becomes unresponsive to a connection attempt. I have not understood if in this case it was like this for all users or only for a specific one.
We are working to a modification of the connection setup which should improve stability and speed of the connections.

For what relates to automatic restarts, if the daemon is up but unresponsive you can use a script+binary which I have recently provided to other admins for similar purposes. It is standalone, i.e. does not depend on ROOT or xrootd. I have just committed to our SVN repository for convenience (it may also go in the next ROOT dist).

To try it out do the following:

$ cd somedir
$ svn co
$ cd xrdping
$ make
$ ./xrdping myhost

There is a README describing the usage and an example of script.
This should allow you to test quickly if the service is up and responding where you expect.

Hope it helps.


Hi Gerri,

the issue I reported arises for all the users.

thanks for providing these scripts. I will use them and let you know if the problem is then covered.