Enable bonjour on proof configuration

Hello,

I was trying to activate the bonjour functionality on our farm, but I didnt succeeded so far. So maybe I can get some help.

In the configuration file I substituted the directives xpd.worker and xpd.master by the following block

if pftest02.pic.es
xpd.bonjour discover
else
xpd.bonjour register cores=4
fi

After restart, when trying to open a proof session I get

$ root -l 
root [0] TProof::Open("pftest02.pic.es")
Starting master: opening connection ...
Starting master: OK                                                 
+++ Query cannot be processed now: enqueued
Error in <TProof::StartSlaves>: no resources available or problems setting up workers (check logs)
Error in <TProof::Open>: new session could not be created
(class TProof*)0x0

From the log of the master, I get a similar message, “no workers currently available”, while from the log of the workers there is no single trace of the connection request.

110110 15:34:37 19663 xpd-I: NetMgr::Dump: +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
110110 15:34:37 19663 xpd-I: NetMgr::Dump: + Active workers status
110110 15:34:37 19663 xpd-I: NetMgr::Dump: + Size: 1
110110 15:34:37 19663 xpd-I: NetMgr::Dump: +
110110 15:34:37 19663 xpd-I: NetMgr::Dump: + wrk: pftest02.pic.es:1093 type:M active sessions:0
110110 15:34:37 19663 xpd-I: NetMgr::Dump: +
110110 15:34:37 19663 xpd-I: NetMgr::Dump: +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
110110 15:34:37 19663 xpd-I: NetMgr::GetActiveWorkers: returning list with 1 entries
110110 15:34:37 19663 xpd-I: NetMgr::Dump: +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
110110 15:34:37 19663 xpd-I: NetMgr::Dump: + Active workers status
110110 15:34:37 19663 xpd-I: NetMgr::Dump: + Size: 1
110110 15:34:37 19663 xpd-I: NetMgr::Dump: +
110110 15:34:37 19663 xpd-I: NetMgr::Dump: + wrk: pftest02.pic.es:1093 type:M active sessions:0
110110 15:34:37 19663 xpd-I: NetMgr::Dump: +
110110 15:34:37 19663 xpd-I: NetMgr::Dump: +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
110110 15:34:37 19663 xpd-I: Sched::GetNumWorkers:  : # act: 0
110110 15:34:37 19663 xpd-I: Sched::GetNumWorkers: 0 : 0
110110 15:34:37 19663 xpd-I: DumpQueues:  ++++++++++++++++++++ DumpQueues ++++++++++++++++++++++++++++++++
110110 15:34:37 19663 xpd-I: DumpQueues:  +++ Called from: Enqueue
110110 15:34:37 19663 xpd-I: DumpQueues:  +++ # of waiting sessions: 1
110110 15:34:37 19663 xpd-I: DumpQueues:  +++ #1 client:cosuna # of queries: 1
110110 15:34:37 19663 xpd-I: DumpQueues:  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
110110 15:34:37 19663 xpd-I: DumpQueries:  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
110110 15:34:37 19663 xpd-I: DumpQueries:  +++ client: cosuna, session: 19688, # of queries: 1
110110 15:34:37 19663 xpd-I: DumpQueries:  +++ #1 tag:static: dset:  size:0
110110 15:34:37 19663 xpd-I: DumpQueries:  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
110110 15:34:37 19663 xpd-I: Sched::GetWorkers: no workers currently available: session enqueued
110110 15:34:37 19663 xpd-I: NetMgr::Dump: +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
110110 15:34:37 19663 xpd-I: NetMgr::Dump: + Active workers status
110110 15:34:37 19663 xpd-I: NetMgr::Dump: + Size: 1
110110 15:34:37 19663 xpd-I: NetMgr::Dump: +
110110 15:34:37 19663 xpd-I: NetMgr::Dump: + wrk: pftest02.pic.es:1093 type:M active sessions:0
110110 15:34:37 19663 xpd-I: NetMgr::Dump: +
110110 15:34:37 19663 xpd-I: NetMgr::Dump: +++++++++++++++++++++++++++++++++++++++++++++++++++++++++

I checked with avahi-discover tool that the service of the workers is properly registered and seen from the master as “_proof._tcp”

Is there something I am doing wrong with the configuration?
I am using 5.28/00

thanks for the help, carlos

Hi Carlos,

Do you get any explicit reference to ‘bonjour’ in the xproofd log on the master?
Something like

...
110113 11:01:52 001 xpd-I: NetMgr::DoDirectiveBonjour: processing Bonjour directive
110113 11:01:52 001 xpd-I: NetMgr::DoDirectiveBonjour: custom service type is _xproofd._tcp
110113 11:01:52 001 xpd-I: NetMgr::DoDirectiveBonjour: custom Bonjour name is xpd-master
110113 11:01:52 001 xpd-I: NetMgr::DoDirectiveBonjour: custom Bonjour name is 'xpd-master'
...
110113 11:01:52 001 xpd-I: NetMgr::Config: configuring
110113 11:01:52 001 xpd-I: NetMgr::Config: PROOF config file: none
110113 11:01:52 001 xpd-I: NetMgr::FindUniqueNodes: # workers: 1
110113 11:01:52 001 xpd-I: NetMgr::FindUniqueNodes: found 0 unique nodes
110113 11:01:52 001 xpd-I: NetMgr::Config: 0 worker nodes defined at start-up
...
------ XrdOucBonjour: discovered a new node: cernvm26.cern.ch
110113 11:01:54 24507 xpd-I: NetMgr::ProcessBonjourUpdate: Updating the network topology
INFO: Bonjour NODE = cernvm26.local:1093 (0xb915c0)
INFO: Bonjour RECORD = cernvm26.cern.ch_xproofd._tcplocal
INFO: Bonjour TXT = 
nodetype=Wcores=8ost 30 secs
110113 11:01:54 24507 xpd-I: NetMgr::ProcessBonjourUpdate: parsing info for node: cernvm26.local, port: 1093
110113 11:01:54 24507 xpd-I: NetMgr::FindUniqueNodes: # workers: 9
110113 11:01:54 24507 xpd-I: NetMgr::FindUniqueNodes: found 1 unique nodes
------ XrdOucBonjour: discovered a new node: cernvm28.cern.ch
110113 11:01:54 24507 xpd-I: NetMgr::ProcessBonjourUpdate: Updating the network topology
INFO: Bonjour NODE = cernvm26.local:1093 (0xb915c0)
INFO: Bonjour RECORD = cernvm26.cern.ch_xproofd._tcplocal
INFO: Bonjour TXT = 
nodetype=Wcores=8ost 30 secs
110113 11:01:54 24507 xpd-I: NetMgr::ProcessBonjourUpdate: parsing info for node: cernvm26.local, port: 1093
110113 11:01:54 24507 xpd-I: NetMgr::ProcessBonjourUpdate:  worker(s) 'cernvm26.local' already in the list
INFO: Bonjour NODE = cernvm28.local:1093 (0xc07cf0)
INFO: Bonjour RECORD = cernvm28.cern.ch_xproofd._tcplocal
INFO: Bonjour TXT = 
nodetype=Wcores=8ktop.DBus
110113 11:01:54 24507 xpd-I: NetMgr::ProcessBonjourUpdate: parsing info for node: cernvm28.local, port: 1093
110113 11:01:54 24507 xpd-I: NetMgr::FindUniqueNodes: # workers: 17
110113 11:01:54 24507 xpd-I: NetMgr::FindUniqueNodes: found 2 unique nodes
------ XrdOucBonjour: discovered a new node: cernvm30.cern.ch
110113 11:01:55 24507 xpd-I: NetMgr::ProcessBonjourUpdate: Updating the network topology
INFO: Bonjour NODE = cernvm26.local:1093 (0xb915c0)
INFO: Bonjour RECORD = cernvm26.cern.ch_xproofd._tcplocal
INFO: Bonjour TXT = 
nodetype=Wcores=8ost 30 secs
110113 11:01:55 24507 xpd-I: NetMgr::ProcessBonjourUpdate: parsing info for node: cernvm26.local, port: 1093
110113 11:01:55 24507 xpd-I: NetMgr::ProcessBonjourUpdate:  worker(s) 'cernvm26.local' already in the list
INFO: Bonjour NODE = cernvm28.local:1093 (0xc07cf0)
INFO: Bonjour RECORD = cernvm28.cern.ch_xproofd._tcplocal
INFO: Bonjour TXT = 
nodetype=Wcores=8ktop.DBus
110113 11:01:55 24507 xpd-I: NetMgr::ProcessBonjourUpdate: parsing info for node: cernvm28.local, port: 1093
110113 11:01:55 24507 xpd-I: NetMgr::ProcessBonjourUpdate:  worker(s) 'cernvm28.local' already in the list
INFO: Bonjour NODE = cernvm30.local:1093 (0xc08c30)
INFO: Bonjour RECORD = cernvm30.cern.ch_xproofd._tcplocal
INFO: Bonjour TXT = 
nodetype=Wcores=8
110113 11:01:55 24507 xpd-I: NetMgr::ProcessBonjourUpdate: parsing info for node: cernvm30.local, port: 1093
110113 11:01:55 24507 xpd-I: NetMgr::FindUniqueNodes: # workers: 25
110113 11:01:55 24507 xpd-I: NetMgr::FindUniqueNodes: found 3 unique nodes
------ XrdOucBonjour: discovered a new node: cernvm32.cern.ch
110113 11:01:55 24507 xpd-I: NetMgr::ProcessBonjourUpdate: Updating the network topology
INFO: Bonjour NODE = cernvm26.local:1093 (0xb915c0)
INFO: Bonjour RECORD = cernvm26.cern.ch_xproofd._tcplocal
INFO: Bonjour TXT = 
nodetype=Wcores=8ost 30 secs
110113 11:01:55 24507 xpd-I: NetMgr::ProcessBonjourUpdate: parsing info for node: cernvm26.local, port: 1093
110113 11:01:55 24507 xpd-I: NetMgr::ProcessBonjourUpdate:  worker(s) 'cernvm26.local' already in the list
INFO: Bonjour NODE = cernvm28.local:1093 (0xc07cf0)
INFO: Bonjour RECORD = cernvm28.cern.ch_xproofd._tcplocal
INFO: Bonjour TXT = 
nodetype=Wcores=8ktop.DBus
110113 11:01:55 24507 xpd-I: NetMgr::ProcessBonjourUpdate: parsing info for node: cernvm28.local, port: 1093
110113 11:01:55 24507 xpd-I: NetMgr::ProcessBonjourUpdate:  worker(s) 'cernvm28.local' already in the list
INFO: Bonjour NODE = cernvm30.local:1093 (0xc08c30)
INFO: Bonjour RECORD = cernvm30.cern.ch_xproofd._tcplocal
INFO: Bonjour TXT = 
nodetype=Wcores=8
110113 11:01:55 24507 xpd-I: NetMgr::ProcessBonjourUpdate: parsing info for node: cernvm30.local, port: 1093
110113 11:01:55 24507 xpd-I: NetMgr::ProcessBonjourUpdate:  worker(s) 'cernvm30.local' already in the list
INFO: Bonjour NODE = cernvm32.local:1093 (0xc09270)
INFO: Bonjour RECORD = cernvm32.cern.ch_xproofd._tcplocal
INFO: Bonjour TXT = 
nodetype=Wcores=8
110113 11:01:55 24507 xpd-I: NetMgr::ProcessBonjourUpdate: parsing info for node: cernvm32.local, port: 1093
110113 11:01:55 24507 xpd-I: NetMgr::FindUniqueNodes: # workers: 33
110113 11:01:55 24507 xpd-I: NetMgr::FindUniqueNodes: found 4 unique nodes
------ XrdOucBonjour: discovered a new node: cernvm34.cern.ch
110113 11:01:55 24507 xpd-I: NetMgr::ProcessBonjourUpdate: Updating the network topology
INFO: Bonjour NODE = cernvm26.local:1093 (0xb915c0)
INFO: Bonjour RECORD = cernvm26.cern.ch_xproofd._tcplocal
INFO: Bonjour TXT = 
nodetype=Wcores=8ost 30 secs
110113 11:01:55 24507 xpd-I: NetMgr::ProcessBonjourUpdate: parsing info for node: cernvm26.local, port: 1093
110113 11:01:55 24507 xpd-I: NetMgr::ProcessBonjourUpdate:  worker(s) 'cernvm26.local' already in the list
INFO: Bonjour NODE = cernvm28.local:1093 (0xc07cf0)
INFO: Bonjour RECORD = cernvm28.cern.ch_xproofd._tcplocal
INFO: Bonjour TXT = 
nodetype=Wcores=8ktop.DBus
110113 11:01:55 24507 xpd-I: NetMgr::ProcessBonjourUpdate: parsing info for node: cernvm28.local, port: 1093
110113 11:01:55 24507 xpd-I: NetMgr::ProcessBonjourUpdate:  worker(s) 'cernvm28.local' already in the list
INFO: Bonjour NODE = cernvm30.local:1093 (0xc08c30)
INFO: Bonjour RECORD = cernvm30.cern.ch_xproofd._tcplocal
INFO: Bonjour TXT = 
nodetype=Wcores=8
110113 11:01:55 24507 xpd-I: NetMgr::ProcessBonjourUpdate: parsing info for node: cernvm30.local, port: 1093
110113 11:01:55 24507 xpd-I: NetMgr::ProcessBonjourUpdate:  worker(s) 'cernvm30.local' already in the list
INFO: Bonjour NODE = cernvm32.local:1093 (0xc09270)
INFO: Bonjour RECORD = cernvm32.cern.ch_xproofd._tcplocal
INFO: Bonjour TXT = 
nodetype=Wcores=8
110113 11:01:55 24507 xpd-I: NetMgr::ProcessBonjourUpdate: parsing info for node: cernvm32.local, port: 1093
110113 11:01:55 24507 xpd-I: NetMgr::ProcessBonjourUpdate:  worker(s) 'cernvm32.local' already in the list
INFO: Bonjour NODE = cernvm34.local:1093 (0xc098b0)
INFO: Bonjour RECORD = cernvm34.cern.ch_xproofd._tcplocal
INFO: Bonjour TXT = 
nodetype=Wcores=8
110113 11:01:55 24507 xpd-I: NetMgr::ProcessBonjourUpdate: parsing info for node: cernvm34.local, port: 1093
110113 11:01:55 24507 xpd-I: NetMgr::FindUniqueNodes: # workers: 41
110113 11:01:55 24507 xpd-I: NetMgr::FindUniqueNodes: found 5 unique nodes
...

Gerri

Hello Gerri

yes, I get similar references to Bonjour, but comparing with your log file, I see that what is really different is the port that Bonjour is getting:

110113 12:59:31 12544 xpd-I: NetMgr::ProcessBonjourUpdate: Updating the network topology
INFO: Bonjour NODE = 192.158.96.10:1094 (0x942c040)
INFO: Bonjour RECORD = pftest04.pic.es_proof._tcplocal
INFO: Bonjour TXT = 
nodetype=W^Gcores=4'org.freedesktop.Avahi.cookie=3601063425
110113 12:59:31 12544 xpd-I: NetMgr::ProcessBonjourUpdate: parsing info for node: 192.158.96.10, port: 1094

while it seems in your case, Bonjour detects a port 1093. So I tried to explicitly specified the port 1093 in the bonjour directive like:

xpd.bonjour register name=$myHost:1093 cores=4

but It didnt work, it is properly register on avahi with the new port, but again ProcessBonjourUpdate takes 1094:

INFO: Bonjour NODE = 192.158.96.10:1094 (0x855dd30)
INFO: Bonjour RECORD = pftest04.pic.es:1093_proof._tcplocal
INFO: Bonjour TXT = 
nodetype=W^Gcores=4'org.freedesktop.Avahi.cookie=3601063425
110113 12:50:11 12274 xpd-I: NetMgr::ProcessBonjourUpdate: parsing info for node: 192.158.96.10, port: 1094

Do you have any idea?

thanks, Carlos

Hi Carlos,

Do you use ‘xproofd’ or ‘xrootd + libXrdProofd.so’ ?
It may be a problem of port detection due to the default protocol.
I will ask the author to follow the thing.

Cheers, Gerri

Hi Carlos,

To set a custom port for the registration (for the discovery is not needed, it cames from the Bonjour resolver, in your case, Avahi), the system should take the port that you have set up in the config file with the port directive xpd.port (you should have to use this in all the processes you want to be different than the default 1093)

You have mor info about the xpd.bonjour directive here: root.cern.ch/drupal/content/conf … de#bonjour

Just to make sure, the Bonjour names and service tipes are not DNS names nor host names, they are internal stuff to make de mDNS discovery, that gets translated fo FQDN and ports after that, so, in summary, you can put whatever you want there, and the system will use it for discovery. Then, it will get translated to DNS + port.

Would you be kind to test this way of configuration and then I will check the code to see if there is any bug related to port detection?

Cheers,

thanks for the clarification,

In fact, I am not using anything different from the default. I was testing the bonjour feature with a minimal configuration:

### Load the XrdProofdProtocol to serve PROOF sessions
if exec xrootd
xrd.protocol xproofd:1093 libXrdProofd.so
fi

xpd.port 1093

if pftest02.pic.es
xpd.bonjour discover
else
xpd.bonjour register cores=4
fi

Therefore, xproofd is already on 1093, but still I try to put explicitly the directive
xpd.port 1093

and I did not observe any change in the log. Still in the discovery part it sees

110113 14:46:20 16889 xpd-I: NetMgr::ProcessBonjourUpdate: parsing info for node: 192.158.96.10, port: 1094

carlos

OK, thank you. I will give a look to the code to see what is the problem and find a solution as soon as possible.

Cheers,

good, thanks Medrano for looking into that.

cheers, carlos

More or less I know what is going on. May I ask you if you can post a copy of the output of the avahi-browse command with the options -v -t -a -r (just to see the resolution of the services)?

Hello

sure, here is what I get from the avahi-browse command from the master (pftest02)

Server version: avahi 0.6.16; Host name: pftest02.local
E Ifce Prot Name                                          Type                 Domain

+ eth0 IPv4 pftest04.pic.es                              _proof._tcp          local
...
= eth0 IPv4 pftest04.pic.es                              _proof._tcp          local
   hostname = [pftest04.local]
   address = [192.158.96.10]
   port = [1094]
   txt = ["org.freedesktop.Avahi.cookie=3601063425" "cores=4" "nodetype=W"]

Hope it can help,

cheers, carlos

This is was I was thinking looking at the code, the slaves, for some reason are not registering using the proper port (that should be 1093), instad, it is defaulting to 1094 whatever port you choose. I will issue a patch as soon as possible.

thanks medrano for looking into that.
Please let me know when the patch is available and I will test it.

cheers, carlos

I have been reproducing the issue in our testing cluster and, unfortunately, I couldn’t do it. I have used the same config file as you, and the same ports and process structure and, in all cases the master discovered the slave correctly.

What version of ROOT are you using? And what Avahi version are you running on? Are you running Avahi as-is or with the mDNS compatibility layer?

Hi Medrano

weird… but I think I know where the port 1094 is being taken from. How do you start the daemon?

In my case in the script that starts xrootd I have the daemon line

daemon $XROOTD -b -l $XRDLOG -R $XRDUSER -c $XRDCF $XRDDEBUG

this typically will allocate xrootd on default 1094 and with the following lines in the conf file

if exec xrootd
xrd.protocol xproofd:1093 libXrdProofd.so
fi

it will allocate proofd on 1093.

If I add “-p 88888” to the daemon line, then avahi will register and discover this port 88888.
Then maybe the difference is that you are starting daemon xproofd directly on 1093 as Gerri was suggesting?

I hope this can help…

I am using root 5.28.00, avahi 0.6.16-9 and I am not sure about the mDNS compatibility layer (let me know how to find out if that is important), though the compat package is there.

cheers, carlos

[quote=“cosuna”]Hi Medrano

weird… but I think I know where the port 1094 is being taken from. How do you start the daemon?

In my case in the script that starts xrootd I have the daemon line

daemon $XROOTD -b -l $XRDLOG -R $XRDUSER -c $XRDCF $XRDDEBUG

this typically will allocate xrootd on default 1094 and with the following lines in the conf file

if exec xrootd
xrd.protocol xproofd:1093 libXrdProofd.so
fi

it will allocate proofd on 1093.

If I add “-p 88888” to the daemon line, then avahi will register and discover this port 88888.
Then maybe the difference is that you are starting daemon xproofd directly on 1093 as Gerri was suggesting?

I hope this can help…

I am using root 5.28.00, avahi 0.6.16-9 and I am not sure about the mDNS compatibility layer (let me know how to find out if that is important), though the compat package is there.

cheers, carlos[/quote]

Yes, I’m starting the daemon just putting the port settings on the config file, not on the command line. Let me try to see if something is not getting properly overridden and it is getting the wrong port.

Cheers,

Hi,

Thanks to Ramón this should be fixed in the trunk and 5-28-00-patches.
Note that ‘xpd.port’ must appear in the config file before ‘xpd.bonjour’ (as in the sample you posted).

Please let us know if you try.

Gerri

great! thanks Ramón & Gerri

any idea of when the first patch for 5.28 will be released?

cheers, carlos

FYI, it has now been released.

Cheers,
Philippe.

  • ‘not’ was corrected to ‘now’

You mean, it is NOW released, right?
The patch is in 5.28/00a .

Gerri