Error during the connection to a PROOF farm

Hello,

I tried to connect to a PROOF farm, but it failed with the following error message:

===========================================
[xxxxx-22:50:37] > root


  •                                     *
    
  •    W E L C O M E  to  R O O T       *
    
  •                                     *
    
  • Version 5.22/00 17 December 2008 *
  •                                     *
    
  • You are welcome to visit our Web site *
  •      [root.cern.ch](http://root.cern.ch)            *
    
  •                                     *
    

ROOT 5.22/00 (trunk@26997, Feb 17 2009, 17:03:00 on linuxx8664gcc)

CINT/ROOT C/C++ Interpreter version 5.16.29, Jan 08, 2008
Type ? for help. Commands must be C++ statements.
Enclose multiple statements between { }.
root [0] TProof* p = TProof::Open(“xxxx.yyy.zz”)
Starting master: opening connection …
Starting master: OK

| session: ycalas.default.5944 terminated by peer
Info in TXSlave::HandleError: 0xbb86d0:xxxx.yyyy.zz:0 got called … fProof: 0xb1dc70, fSocket: 0xbb8af0
(valid: 1)
Info in TXSlave::HandleError: 0xbb86d0: proof: 0xb1dc70
TXSlave::HandleError: 0xbb86d0: DONE …

In the logs, I noticed the following error message:

===========================================
090423 22:50:48 11877 xpd-I: ycalas.5934:27@xxxx: ClientMgr::MapClient: user ycalas logged-in (privileged);
type: ClientMaster
090423 22:50:49 11877 xpd-I: ycalas.5944:28@localhost: ClientMgr::MapClient: user ycalas logged-in (privileged);
type: Internal
090423 22:50:49 11877 xpd-E: NetMgr::GetActiveWorkers: unable to read the configuration file
090423 22:50:49 11877 XrdLink: Unable to send to ycalas.5944:28@localhost; bad address
090423 22:50:49 11877 xpd-I: ycalas.5944:28@localhost: Protocol::recycle: user ycalas disconnected; type: Internal
090423 22:50:49 11877 xpd-E: 0100 ycalas.5944:28@localhost: xrd->0: Response::Send:9: sending 1 data bytes;
status=0: problems sending
9 bytes (writev)
090423 22:50:49 11877 xpd-E: Protocol::Process2: link is undefined!
090423 22:50:55 11877 xpd-I: ycalas.5934:27@xxxx: Protocol::recycle: user ycalas disconnected; type:
ClientMaster

The easy way to solve this problem was to restart the xrootd daemon on the master node.

Any idea what could be the problem?

Thanks,

Yvan

Dear Yvan,

Sorry for the late reply.

Do I understand correctly that it stops working at a certain point with the errors you reported and the restarting xrootd it works again?

Or does it never work?

Gerri Ganis

Yes it stops working at a certain point with the errors I reported. It works again as soon as xrootd is restarted.

Thanks for your help,

Yvan

[quote=“ganis”]Dear Yvan,

Sorry for the late reply.

Do I understand correctly that it stops working at a certain point with the errors you reported and the restarting xrootd it works again?

Or does it never work?

Gerri Ganis[/quote]

Dear Yvan,

The error on the server side indicates problems with reading / parsing the PROOF workers config file (proof.conf).
This indeed may prevent a session from being started and it produces errors like those you got.

But this is a first time that I see this happening after a while.
Does it happen after a fixed lapse of time or a well defined event? Is ‘proof.conf’ on a file system that requires credentials, like AFS?

Gerri

Dear Gerri,

I checked the xrootd log files (starting from 18th March), and I got:

=======================================

grep ‘unable to read the configuration file’ xrootd.log.20090*
xrootd.log.20090323:090323 10:37:53 23005 xpd-E: NetMgr::GetActiveWorkers: unable to read the configuration file
xrootd.log.20090323:090323 10:42:03 23005 xpd-E: NetMgr::GetActiveWorkers: unable to read the configuration file
xrootd.log.20090323:090323 10:45:37 23005 xpd-E: NetMgr::GetActiveWorkers: unable to read the configuration file
xrootd.log.20090323:090323 10:46:58 23005 xpd-E: NetMgr::GetActiveWorkers: unable to read the configuration file
xrootd.log.20090323:090323 10:52:21 23005 xpd-E: NetMgr::GetActiveWorkers: unable to read the configuration file
xrootd.log.20090323:090323 10:58:31 23005 xpd-E: NetMgr::GetActiveWorkers: unable to read the configuration file
xrootd.log.20090423:090423 17:04:48 11877 xpd-E: NetMgr::GetActiveWorkers: unable to read the configuration file
xrootd.log.20090423:090423 17:05:20 11877 xpd-E: NetMgr::GetActiveWorkers: unable to read the configuration file
xrootd.log.20090423:090423 17:05:55 11877 xpd-E: NetMgr::GetActiveWorkers: unable to read the configuration file
xrootd.log.20090423:090423 17:06:34 11877 xpd-E: NetMgr::GetActiveWorkers: unable to read the configuration file
xrootd.log.20090423:090423 17:08:29 11877 xpd-E: NetMgr::GetActiveWorkers: unable to read the configuration file
xrootd.log.20090423:090423 22:50:49 11877 xpd-E: NetMgr::GetActiveWorkers: unable to read the configuration file
xrootd.log.20090423:090423 22:51:40 11877 xpd-E: NetMgr::GetActiveWorkers: unable to read the configuration file
xrootd.log.20090429:090429 14:35:19 6041 xpd-E: NetMgr::GetActiveWorkers: unable to read the configuration file
=======================================

The configuration files are indeed on AFS…

Cheers,

Yvan[/list]

Dear Yvan,

So the file becomes unreadable at a given moment.
By default PROOF is checking for modifications of the file each time a new session starts.
The idea behind was to be allow for modifications of the file w/o forcing a restart of the daemon.

The problem that you are observing points to the more general problem of credentials renewal which we have not really addressed yet.

For the specific case of <proof.conf>, I should mention that we are going to deprecate the usage of the <proof.conf> file staring from the next ROOT production version.
We encourage the use of the ‘xpd.worker’ directive (see root.cern.ch/drupal/content/xpdworker-directive ; note that some functionality was introduced only
recently - e.g. repeat’ - but the rest should be available also in 5.22.00).
This does not suffer from the same problem, because <xrootd.cf> is read only once at startup.
So I suggest that you this new way of defining the cluster and let us know your feedback.

Gerri

Dear Gerri,

Thank you for your help. I made the modifications, as suggested.

Cheers,

Yvan