Running xrootd

Lara,

how do you start your xrootd? As ‘root’ or as yourself?

Right, but you need to be able to read the private keys.

Also, for what relates to PROOF, you are using an old version of ROOT

Can you move to something more recent, e.g.
/afs/cern.ch/sw/lcg/contrib/proof/root/5.17.09-PROOF.00/slc4_ia32_gcc34/root
?
Support would be easier …

Gerri

I’m running xrootd as myself…

But anyway, in my conf file I’m telling xrootd to read the server certificates, not the “user certificates” don’t I? I mean, the ones in /etc/grid-security, not the ones in /.globus

Lara

Ok,

But then it is normal that you get the error, because xrootd has no rigths to read the certificate private key (the one in /etc/grid-security/hostkey.pem) .

Making use of the user certificate (the one in ~/.globus) on the server side is only possible if you run xrootd in foreground mode (no -b flag), because you need to enter a password.

I’ll try to think to a possible workaround …

Gerri

Hi again,

Now I can authenticate when I run xrootd daemon it ask me for the certificate password of my computer. But what I really want is to be asked for MY grid password on proof login in case the proxy has expired

Don’t know if what I did yesterday was very useful for this purpose

Lara

By the way, I moved to root_v5.17.08 :slight_smile:

Hi Gerry,

To clarify things with respect to what Lara is doing and the reason why an old version of ROOT/xrootd is being used: Our intention is to run CMSSW code which ships its own ROOT version (quite old :confused: ) and which should not be changed (we cannot recompile all the libraries).

The cluster is made of a master + 4 slaves. All of them share the CMSSW (and hence ROOT/PROOF) installation, so they are equal. All of them have host certificates (but no service certificates). In the old proofd times I could set globus authentication pretty easily (with your help :wink: ). I guess it is not that difficult with xrootd. Also, we can run xrootd from the su account, no problem with that.

So which would be your advice on how to set it up? Or is there a point we could look for instructions?

I also wonder if we can have a newer version of xrootd/root that is able to launch older sessions of root (specifically those being used now).

Hope I clarified a bit the situation. Thanks!

Isidro

Hi Lara, Isidro,

Sorry for not being able to reply before.

For what relates to Globus authentication, the protocol foresees that the server (xrootd) has access to a valid certificate and its private. This can be a host, service or normal user certificate. The latter has the disadvantage that you’ll get prompted, at start-up, for the pass-phrase protecting the private key: this means that it cannot be used in background mode (xrootd -b …), because in such a mode the daemon explicitly detaches from the controlling terminal, so that there is no way to enter the password.

But if you can run xrootd as ‘su’, as we did for ‘proofd’, there should not be any problem.

For testing purpose, starting the daemon on the command line that’s fine. I believe that this is what Lara did

Of course you should get prompted for your password on the client side

(unless you have already a valid proxy, which, I guess, is not the case).
Please, can you do the following:

  1. switch on debugging before starting PROOF
root[] gEnv->SetValue("XProof.Debug", 2)
root[] TProof::Open(master)

and post the logs that you get on the screen?
2. Post the logs that you get when starting xrootd
?

About versions now.
Yes, we can run different versions, but sometimes we have incompatibilities, because the libXrdProofd plugin also changed and it was not always possible to keep backward compatibility.
I know that you need to be able to run with 5.14 . You are not the only ones, and that’s why I started porting back the new PROOF to 5.14.00i (the last patch release of 5.14). I have an SVN dev branch for that

root.cern.ch/viewcvs/branches/de … 00-newxrd/

but the PROOF part is not yet ready. I will try to accelerate this process.

In the meantime, i suggest to give a try by taking xrootd and the libXrdProofd from a new version and try to load ‘proofserv’ from 5.14/00f .

I’ll give a try right now to this combination and post the instructions in the next post.

Cheers, Gerri

Hi Gerri,

I’m trying to do what you said, but I’ve problems with the xrootd daemon, I don’t know why when I try to take the libraries from the last root version it doesn’t work.
I get the following:

Unable to open /afs/fanae/scratch/root/root/lib/libXrdProofd.so libXrdClient.so: cannot open shared object file: No such file
or directory
XrdProtocol: Protocol xproofd could not be loaded
xrootd anon@fanae41.geol.uniovi.es:1094 initialization failed.

But this file does exist and the path is correct.

Any suggestion?

And another question, do I need to have a ~/.rootrc and ~/.rootauthrc file with authentication options?

Lara

Hi Lara,

I think you are experiencing an environment setting problem.
Can you post the full log (up to the error)?

How do you start xrootd? From the command line or via a init.d script?

Is LD_LIBRARY_PATH correctly defined? (i.e. does it include /afs/fanae/scratch/root/root/lib ?)

For this

The answer is no (the “standard” authentication directives are not used by the xrootd security plugins; there are some xrootd specific variables that can be set in ~/.rootrc, but forget about those for the time being).

Gerri

[b]Hello,

I’m running the /etc/init.d script as root

My log looks like this, now my master (fanae41) seems to work fine but all the other machines don’t…[/b]

080121 12:51:10 001 Scalla is starting. . .
Copr. 2007 Stanford University, xrd version 20071116-0000a
Config using configuration file /afs/fanae/code/Proof/xpd.cf
++++++ xrootd anon@fanae42.geol.uniovi.es initialization started.
=====> xrd.port 1094
=====> xrd.protocol xproofd:1092 /afs/fanae/scratch/root/root/lib/libXrdProofd.so
Config maximum number of connections restricted to 1024
080121 12:51:10 001 XrdSched: scheduling underused thread monitor in 780 seconds
080121 12:51:10 001 XrdSched: Starting with 2 workers
080121 12:51:10 001 XrdLink: Allocating 16 link objects at a time
080121 12:51:10 001 XrdPoll: Starting poller 0
080121 12:51:11 001 XrdPoll: Starting poller 1
080121 12:51:11 001 XrdPoll: Starting poller 2
080121 12:51:11 001 XrdProtocol: getting port from protocol xrootd
080121 12:51:11 001 XrdProtocol: getting port from protocol xproofd
— Proofd: : GetNumCPUs: # of cores found: 1
080121 12:51:11 001 XrdProtocol: getting protocol object xrootd
Copr. 2007 Stanford University, xrootd version 2.9.0 build 20071116-0000a
++++++ xrootd protocol initialization started.
=====> xrootd.export /scratch/proofpool
=====> xrootd.fslib /afs/fanae/scratch/root/root/lib/libXrdOfs.so
080121 12:51:11 001 XrootdAioReq: Max aio/req=8; aio/srv=4096; Quantum=131072
080121 12:51:11 001 XrootdAioReq: Adding 30 aioreq objects.
080121 12:51:11 001 XrootdAio: Adding 24 aio objects; 4096 pending.
Config warning: ‘xrootd.seclib’ not specified; strong authentication disabled!
080121 12:51:11 001 XrootdProtocol: Loading filesystem library /afs/fanae/scratch/root/root/lib/libXrdOfs.so
Copr. 2007 Stanford University, Ofs Version 20071116-0000a
++++++ File system initialization started.
Config warning: redirect directive is deprecated; use ‘all.role’.
=====> ofs.redirect target
=====> all.role server
++++++ Configuring server role. . .
=====> all.manager fanae41 3121
Config effective /afs/fanae/code/Proof/xpd.cf ofs configuration:
ofs.role server
ofs.fdscan 9 120 1200
ofs.maxdelay 60
ofs.trace bfcd
------ File system server initialization completed.
Copr. 2007, Stanford University, oss Version 20071116-0000a
++++++ Storage system initialization started.
=====> oss.cache public /scratch/cache*
=====> oss.path /scratch/proofpool r/w
080121 12:51:11 001 oss_AioInit: started AIO read signal thread; tid=3078785952
080121 12:51:11 001 oss_AioInit: started AIO write signal thread; tid=3077995424
Config effective /afs/fanae/code/Proof/xpd.cf oss configuration:
oss.alloc 0 0 0
oss.cachescan 600
oss.compdetect *
oss.fdlimit 512 1024
oss.maxdbsize 0
oss.trace fff
oss.xfr 1 9437184 30 10800
oss.memfile off max 397117440
oss.cache public /scratch/cache/
oss.defaults r/w nocheck nodread nomig norcreate nostage
oss.path /scratch/proofpool r/w nocheck nodread nomig norcreate nostage
------ Storage system initialization completed.
080121 12:51:11 001 XrdSched: scheduling xrootd protocol anchor in 3600 seconds
Config warning: ‘xrootd.prepare logdir’ not specified; prepare tracking disabled.
Config exporting /scratch/proofpool
------ xrootd protocol initialization completed.
080121 12:51:11 001 XrdProtocol: getting protocol object xproofd
080121 12:51:11 001 xpd : ProofdManager: Config: file: /afs/fanae/code/Proof/xpd.cf
080121 12:51:11 001 xpd : ProofdManager: Config: time of last modification: 1200914419
— Proofd: : DoDirectiveString: set seclib to /afs/fanae/scratch/root/root/lib/libXrdSec.so
080121 12:51:11 001 xpd : XrdROOT::ValidatePrgmSrv: forking test and protocol retrieval
080121 12:51:11 001 xpd : XrdROOT::ValidatePrgmSrv: forking external proofsrv
xpd:child: : SetProofServEnv: enter: ROOT dir: /cms/slc4_ia32_gcc345/lcg/root/5.14.00f-CMS3q
080121 12:51:11 001 xpd : XrdROOT::ValidatePrgmSrv: test server launched: wait for protocol
080121 12:51:12 001 xpd : DoDirectiveRootSys: validation OK for: 5.14/00f 5.14/00f /cms/slc4_ia32_gcc345/lcg/root/5.14.00f-CMS3q 12
— Proofd: : DoDirectiveString: set workdir to /scratch/proofbox
080121 12:51:12 001 xpd : DoDirectiveResource: configuration file cannot be read: /afs/fanae/user/lara/CMSSW_1_6_7/all
— Proofd: : >>> Warning: ‘if’ conditions at the end of the directive are deprecated
— Proofd: : >>> Please use standard Scalla/Xrootd ‘if-else-fi’ constructs
— Proofd: : >>> (see xrootd.slac.stanford.edu/doc/xrd … config.htm)
— Proofd: : CheckIf: : fanae*
— Proofd: : >>> Warning: ‘if’ conditions at the end of the directive are deprecated
— Proofd: : >>> Please use standard Scalla/Xrootd ‘if-else-fi’ constructs
— Proofd: : >>> (see xrootd.slac.stanford.edu/doc/xrd … config.htm)
— Proofd: : CheckIf: : fanae41
— Proofd: : DoDirectiveString: set poolurl to root://fanae41
— Proofd: : DoDirectiveString: set namespace to /scratch/proofpool
080121 12:51:12 001 ProofdManager: ParseConfig: configuring
080121 12:51:12 001 ProofdManager: ParseConfig: working directories under: /scratch/proofbox
++++++ Authentication system initialization started.
080121 12:51:12 001 secgsi_Init: option CACheck: 1
080121 12:51:12 001 secgsi_Init: testing CA dir(s): /etc/grid-security/certificates
080121 12:51:12 001 secgsi_Init: using CA dir(s): /etc/grid-security/certificates/
080121 12:51:12 001 secgsi_Init: option CRLCheck: 2
080121 12:51:12 001 secgsi_Init: using CRL dir(s): /etc/grid-security/certificates/
080121 12:51:12 001 crypto_Factory::GetCryptoFactory: loading ssl crypto factory object from libXrdCrypto.so
080121 12:51:12 001 crypto_Factory::GetCryptoFactory: loading ssl crypto factory object from libXrdCryptossl.so
080121 12:51:12 001 sut_Rndm::GetBuffer: enter: len: 32
080121 12:51:12 001 sut_Rndm::Init: taking seed from /dev/urandom
080121 12:51:12 001 cryptossl_sslCipher::XrdCryptosslCipher: generate DH full key
080121 12:51:12 001 sut_Cache::Init: cache allocated for 100 entries
080121 12:51:12 001 sut_Cache::Rehash: Hash table updated (found 0 active entries)
080121 12:51:12 001 secgsi_LoadCADir: analysing entry /etc/grid-security/certificates/.
080121 12:51:12 001 secgsi_LoadCADir: Entry /etc/grid-security/certificates/. does not contain a valid CA
080121 12:51:12 001 secgsi_LoadCADir: analysing entry /etc/grid-security/certificates/…
080121 12:51:12 001 secgsi_LoadCADir: Entry /etc/grid-security/certificates/… does not contain a valid CA
080121 12:51:12 001 secgsi_LoadCADir: analysing entry /etc/grid-security/certificates/9b59ecad.signing_policy
080121 12:51:12 001 secgsi_LoadCADir: Entry /etc/grid-security/certificates/9b59ecad.signing_policy does not contain a valid CA
080121 12:51:12 001 secgsi_LoadCADir: analysing entry /etc/grid-security/certificates/3d5be7bc.r0
080121 12:51:12 001 secgsi_LoadCADir: Entry /etc/grid-security/certificates/3d5be7bc.r0 does not contain a valid CA
080121 12:51:12 001 secgsi_LoadCADir: analysing entry /etc/grid-security/certificates/82b36fca.crl_url
080121 12:51:12 001 secgsi_LoadCADir: Entry /etc/grid-security/certificates/82b36fca.crl_url does not contain a valid CA
080121 12:51:12 001 secgsi_LoadCADir: analysing entry /etc/grid-security/certificates/8a047de1.r0
080121 12:51:12 001 secgsi_LoadCADir: Entry /etc/grid-security/certificates/8a047de1.r0 does not contain a valid CA
080121 12:51:12 001 secgsi_LoadCADir: analysing entry /etc/grid-security/certificates/ff94d436.r0
080121 12:51:12 001 secgsi_LoadCADir: Entry /etc/grid-security/certificates/ff94d436.r0 does not contain a valid CA
080121 12:51:12 001 secgsi_LoadCADir: analysing entry /etc/grid-security/certificates/d1b603c3.r0
080121 12:51:12 001 secgsi_LoadCADir: Entry /etc/grid-security/certificates/d1b603c3.r0 does not contain a valid CA
080121 12:51:12 001 secgsi_LoadCADir: analysing entry /etc/grid-security/certificates/a317c467.info
080121 12:51:12 001 secgsi_LoadCADir: Entry /etc/grid-security/certificates/a317c467.info does not contain a valid CA
080121 12:51:12 001 secgsi_LoadCADir: analysing entry /etc/grid-security/certificates/11b4a5a2.0
080121 12:51:12 001 cryptossl_X509::IsCA: certificate has 11 extensions
080121 12:51:12 001 cryptossl_X509::IsCA: CA certificate
080121 12:51:12 001 cryptossl_X509ParseFile: certificate added to the chain - ord: 1
080121 12:51:12 001 cryptossl_X509ParseFile: no RSA private key found in file /etc/grid-security/certificates/11b4a5a2.0
080121 12:51:12 001 secgsi_LoadCRL: target file: /etc/grid-security/certificates/11b4a5a2.r0
080121 12:51:12 001 cryptossl_X509Crl::XrdCryptosslX509Crl_file: CRL successfully loaded
080121 12:51:12 001 cryptossl_LoadCache: 78certificates have been revoked
080121 12:51:12 001 sut_Cache::Init: cache allocated for 78 entries
080121 12:51:12 001 sut_Cache::Rehash: Hash table updated (found 0 active entries)
080121 12:51:12 001 sut_Cache::Rehash: Hash table updated (found 1 active entries)

080121 12:51:19 001 cryptossl_X509Crl::XrdCryptosslX509Crl_file: CRL successfully loaded
080121 12:51:19 001 cryptossl_LoadCache: 1certificates have been revoked
080121 12:51:19 001 sut_Cache::Init: cache allocated for 1 entries
080121 12:51:19 001 sut_Cache::Rehash: Hash table updated (found 0 active entries)
080121 12:51:19 001 sut_Cache::Rehash: Hash table updated (found 1 active entries)
080121 12:51:19 001 sut_Cache::Rehash: Hash table updated (found 1 active entries)
080121 12:51:19 001 sut_Cache::Rehash: Hash table updated (found 73 active entries)
080121 12:51:19 001 secgsi_LoadCADir: analysing entry /etc/grid-security/certificates/98ef0ee5.crl_url
080121 12:51:19 001 secgsi_LoadCADir: Entry /etc/grid-security/certificates/98ef0ee5.crl_url does not contain a valid CA
080121 12:51:19 001 sut_Cache::Rehash: Hash table updated (found 73 active entries)
080121 12:51:19 001 sut_Cache::Init: cache allocated for 10 entries
080121 12:51:19 001 sut_Cache::Rehash: Hash table updated (found 0 active entries)
080121 12:51:19 001 cryptossl_X509::XrdCryptosslX509_file: certificate successfully loaded
080121 12:51:19 001 cryptossl_X509::IsCA: certificate has 13 extensions
080121 12:51:19 001 cryptossl_X509::XrdCryptosslX509_file: cannot open file /etc/grid-security/hostkey.pem (errno: 13)
080121 12:51:19 001 secgsi_Init: problems loading srv cert: invalid PKI
080121 12:51:19 001 sut_Cache::Rehash: Hash table updated (found 0 active entries)
080121 12:51:19 001 secgsi_ErrF: Secgsi: ErrError: no valid server certificate found
080121 12:51:19 001 secgsi_Init: Secgsi: ErrError: no valid server certificate found
080121 12:51:19 001 sec_Config: Secgsi: ErrError: no valid server certificate found
=====> sec.protocol gsi -dlgpxy:1 -d:1 -certdir:/etc/grid-security/certificates -cert:/etc/grid-security/hostcert.pem -key:/etc/grid-security/hostkey.pem
Config 1 authentication directives processed in /tmp/xpdcfn_oUEMcL
------ Authentication system initialization failed.
080121 12:51:19 001 xpdLoadSecurity: Unable to create security service object via /afs/fanae/scratch/root/root/lib/libXrdSec.so
080121 12:51:19 001 xpd: ProofdManager: ParseConfig: unable to load security system.
080121 12:51:19 001 XrdProtocol: Protocol xproofd could not be loaded
------ xrootd anon@fanae42.geol.uniovi.es:1094 initialization failed.
080121 12:51:19 001 XrdSched: scheduling midnight runner in 40121 seconds

And…my xpd.cf file looks like this:

setenv LD_LIBRARY_PATH /afs/fanae/scratch/root/root/lib/

XRD port

xrd.port 1094

if exec xrootd
xrd.protocol xproofd:1092 /afs/fanae/scratch/root/root/lib/libXrdProofd.so
fi

xpd.seclib /afs/fanae/scratch/root/root/lib/libXrdSec.so
xpd.sec.protocol gsi -dlgpxy:1 -d:1 -certdir:/etc/grid-security/certificates -cert:/etc/grid-security/hostcert.pem -key:/etc/grid-security/hostkey.pem

Export /scratch/proofpool

xrootd.export /scratch/proofpool

FS lib

xrootd.fslib /afs/fanae/scratch/root/root/lib/libXrdOfs.so

OpenFS section

if fanae41
ofs.redirect remote
ofs.forward all
else
ofs.redirect target
fi

OSS section

oss.cache public /scratch/cache*
oss.path /scratch/proofpool r/w

OLB / ODC section

Port

olb.port 3121

Paths

olb.path w /scratch/proofpool

Role

if fanae41
all.role manager
else
all.role server
fi

Manager location (ignored by managers)

all.manager fanae41 3121

Delay client requests at manager startup

olb.delay startup 30

PROOF part

(xrootd only: the ‘xpd.’ directives are ignored if the protocol is not loaded)

Load the XrdProofd protocol:

using absolute paths (<ROOT_sys> with the path to the ROOT distribution)

#if exec xrootd
#xrd.protocol xproofd:1092 /afs/fanae/scratch/root/root/lib/libXrdProofd.so
#fi

ROOTSYS

xpd.rootsys /cms/slc4_ia32_gcc345/lcg/root/5.14.00f-CMS3q

Working directory for sessions [<User_Home>/proof]

xpd.workdir /scratch/proofbox

Resource finder

NB: 'if ’ not supported for this directive.

xpd.resource static [<cfg_file>] [ucfg:<user_cfg_opt>] [wmx:<max_workers>]

[selopt:<selection_mode>]

“static”, i.e. using a config file

<cfg_file> path alternative config file

[$ROOTSYS/proof/etc/proof.conf]

<user_cfg_opt> if “yes”: enable user private config files at

$HOME/.proof.conf or $HOME/.<usr_def>, where

<usr_cfg> is the second argument to

TProof::Open("","<usr_cfg>") [“no”]

<max_workers> Maximum number of workers to be assigned to user

session [-1, i.e. all]

<selection_mode> If <max_workers> != -1, specify the way workers

are chosen:

“roundrobin” round-robin selection in bunches

of n(=<max_workers>) workers.

Example:

N = 10 (available workers), n = 4:

1st (session): 1-4, 2nd: 5-8,

3rd: 9,10,1,2, 4th: 3-6, …

“random” random choice (a worker is not

assigned twice)

xpd.resource static /cms/slc4_ia32_gcc345/lcg/root/5.14.00f-CMS3q/etc/proof.conf all

Server role (master, submaster, worker) [default: any]

Allows to control the cluster structure.

The following (commented) example will set lxb6046 as master, and all

the others lxb* as workers

xpd.role worker if fanae*
xpd.role master if fanae41

Master(s) allowed to connect. Directive active only for Worker or

Submaster session requests. Multiple ‘allow’ directives can

be specified. By default all connections are allowed.

xpd.allow fanae41

URL and namespace for the local storage if different from defaults.

By the default it is assumed that the pool space on the cluster is

accessed via a redirector running at the top master under the common

namespace /proofpool.

Any relevant protocol specification should be included here.

xpd.poolurl root://fanae41
xpd.namespace /scratch/proofpool

And my /etc/init.d/xrootd file is as follows:

#! /bin/sh

xrootd Start/Stop the XROOTD daemon

chkconfig: 345 20 80

description: The xrootd daemon is used to as file server and starter of

the PROOF worker processes.

processname: xrootd

pidfile: /var/run/xrootd.pid

config:

XROOTD=/afs/fanae/scratch/root/root/bin/xrootd
XRDLIBS=/afs/fanae/scratch/root/root/lib

Source function library.

. /etc/init.d/functions

Get config.

. /etc/sysconfig/network

Get xrootd config

[ -f /etc/sysconfig/xrootd ] && . /etc/sysconfig/xrootd

Read user config

[ ! -z “$XRDUSERCONFIG” ] && [ -f “$XRDUSERCONFIG” ] && . $XRDUSERCONFIG

Check that networking is up.

if [ ${NETWORKING} = “no” ]
then
exit 0
fi

[ -x $XROOTD ] || exit 0

RETVAL=0
prog=“xrootd”

start() {
echo -n $"Starting $prog: "
# Options are specified in /etc/sysconfig/xrootd .
# See $ROOTSYS/etc/daemons/xrootd.sysconfig for an example.
# $XRDUSER must be the name of an existing non-privileged user.
export LD_LIBRARY_PATH=$XRDLIBS:$LD_LIBRARY_PATH
daemon $XROOTD -l $XRDLOG -R $XRDUSER -c $XRDCF $XRDDEBUG
RETVAL=$?
echo
[ $RETVAL -eq 0 ] && touch /var/lock/subsys/xrootd
return $RETVAL
}

stop() {
[ ! -f /var/lock/subsys/xrootd ] && return 0 || true
echo -n $"Stopping $prog: "
killproc xrootd
RETVAL=$?
echo
[ $RETVAL -eq 0 ] && rm -f /var/lock/subsys/xrootd
return $RETVAL
}

See how we were called.

case “$1” in
start)
start
;;
stop)
stop
;;
status)
status xrootd
RETVAL=$?
;;
restart|reload)
stop
start
;;
condrestart)
if [ -f /var/lock/subsys/xrootd ]; then
stop
start
fi
;;
*)
echo $"Usage: $0 {start|stop|status|restart|reload|condrestart}"
exit 1
esac

exit $RETVAL

Which file do I have to edit if I want to set my .globus/user authentication???

Lara

Hello again,

The daemons are now running. When I try to start a new session I get the following.

Error in TAuthenticate::Authenticate on master0: no support for Globus authentication available
Info in TAuthenticate::Authenticate on master0: failure: list of attempted methods: Globus
Error in on master0: authentication failed for @fanae23.geol.uniovi.es
Error in TSocket::Authenticate on master0: authentication attempt failed for @fanae23.geol.uniovi.es
Error in TAuthenticate::Authenticate on master0: no support for Globus authentication available
Info in TAuthenticate::Authenticate on master0: failure: list of attempted methods: Globus
Error in on master0: authentication failed for @fanae38.geol.uniovi.es

The my certificates paths are right, they are the ones by default,

/home/$USER/.globus/usercert.pem
…/userkey.pem

Lara

Hi Lara,

I think I have understood where the problem is and it is related to a bug that was somehow added recently.

I will put soon the source tarball for a version containing the fix under /afs/cern.ch/sw/lcg/contrib/proof/root .

I can produce the binaries if you let me know which OS/architecture/compiler that you are using.

Cheers, Gerri

Hi,

We are using slc4_ia32_gcc345

Thank you,

Lara

Hi Lara,

I have put the updated version in

/afs/cern.ch/sw/lcg/contrib/proof/root/5.19.01-PROOF.00/

Binaries for slc4_ia32_gcc34 are under ./slc4_ia32_gcc34 and ./slc4_ia32_gcc34_dbg, the source files under ./src .
The source tarball is at

/afs/cern.ch/sw/lcg/contrib/proof/root/tar/root_v5.19.01-PROOF.00.src.tgz

Please try and let me know.
I can make quick updates of these version if needed to fix something else.

Gerri

Hello,

I don’t know why, but I’m not able to run proof with this new version. Apparently the daemons are working fine, but when I try to start a new Proof session I get the following errors (The config files are the same, I just changed the paths)
I"m trying to do it without authentication now, but with it it doesn’t work either.
One question, how many config files do I have to edit to enable globus authentication? I’ve just added these two lines in my xpd.cf :

xpd.seclib /afs/fanae/scratch/root/root/lib/libXrdSec.so
xpd.sec.protocol gsi -dlgpxy:1 -d:1 -certdir:/etc/grid-security/certificates -cert:/etc/grid-security/hostcert.pem -key:/etc/grid-security/hostkey.pem

(But this two lines here are just to enable host authentication, not user/.globus authentication don’t they? Do I have to add something more?)

Then I deleted the -noauth in my /etc/init.d/proofd file and I edited the system.root* files:
-system.rootrc
-system.rootauthrc

  • system.rootdaemonrc

by adding : fanae* list 3 4

These are the only changes I did. I don’t know if it’s enough , if it should work or not like that…
It’s also necessary to edit the $ROOTSYS/etc/proof/proof.conf specifying the desired authentication?
My proof.conf file has just a list with the name of all the workers and the master and their roles.

The errors I got are the following:

— Proofd: : GetNumCPUs: # of cores found: 1

*** Break *** segmentation violation
(no debugging symbols found)
Using host libthread_db library “/lib/tls/libthread_db.so.1”.
Attaching to program: /proc/31146/exe, process 31146
(no debugging symbols found)…done.
[Thread debugging using libthread_db enabled]
[New Thread -1208613184 (LWP 31146)]
[New Thread -1208693856 (LWP 31155)]

0x004a57a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
Thread 2 (Thread -1208693856 (LWP 31155)):
#0 0x004a57a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x00549f06 in __nanosleep_nocancel () from /lib/tls/libc.so.6
#2 0x00580f4a in usleep () from /lib/tls/libc.so.6
#3 0x04c237da in GarbageCollectorThread (arg=0x97277f8, thr=0x953b8f8)
at XrdClientConnMgr.cc:73
#4 0x04c3570e in XrdClientThreadDispatcher (arg=0x953b904)
at XrdClientThread.cc:32
#5 0x049ed98a in XrdSysThread_Xeq ()
from /afs/fanae/scratch/root/root/lib/libXrdProofd.so
#6 0x0013e3cc in start_thread () from /lib/tls/libpthread.so.0
#7 0x00587c3e in clone () from /lib/tls/libc.so.6

Thread 1 (Thread -1208613184 (LWP 31146)):
#0 0x004a57a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x005497fb in __waitpid_nocancel () from /lib/tls/libc.so.6
#2 0x004f3649 in do_system () from /lib/tls/libc.so.6
#3 0x004f39c1 in system () from /lib/tls/libc.so.6
#4 0x001448bd in system () from /lib/tls/libpthread.so.0
#5 0x0081091f in TUnixSystem::Exec ()
from /afs/fanae/scratch/root/root/lib/libCore.so
#6 0x0081632d in TUnixSystem::StackTrace ()
from /afs/fanae/scratch/root/root/lib/libCore.so
#7 0x00812fea in TUnixSystem::DispatchSignals ()
from /afs/fanae/scratch/root/root/lib/libCore.so
#8 0x00813078 in SigHandler ()
from /afs/fanae/scratch/root/root/lib/libCore.so
#9 0x008122c5 in sighandler ()
from /afs/fanae/scratch/root/root/lib/libCore.so
#10
#11 0x0745ee67 in TProof::AddInput ()
from /afs/fanae/scratch/root/root/lib/libProof.so
#12 0x074742e3 in TProof::Init ()
from /afs/fanae/scratch/root/root/lib/libProof.so
#13 0x074780b9 in TProof::TProof ()
from /afs/fanae/scratch/root/root/lib/libProof.so
#14 0x0748b35a in TProofMgr::CreateSession ()
from /afs/fanae/scratch/root/root/lib/libProof.so
#15 0x07465803 in TProof::Open ()
from /afs/fanae/scratch/root/root/lib/libProof.so
#16 0x074c651d in G__G__Proof_111_0_230 ()
from /afs/fanae/scratch/root/root/lib/libProof.so
#17 0x00cb7a5f in Cint::G__ExceptionWrapper ()
from /afs/fanae/scratch/root/root/lib/libCint.so
#18 0x00d74327 in G__call_cppfunc ()
from /afs/fanae/scratch/root/root/lib/libCint.so
#19 0x00d598dd in G__interpret_func ()
from /afs/fanae/scratch/root/root/lib/libCint.so
#20 0x00d479d4 in G__getfunction ()
from /afs/fanae/scratch/root/root/lib/libCint.so
#21 0x00d2bcb8 in G__getitem ()
from /afs/fanae/scratch/root/root/lib/libCint.so
#22 0x00d2e8ff in G__getexpr ()
from /afs/fanae/scratch/root/root/lib/libCint.so
#23 0x00d1e16c in G__define_var ()
from /afs/fanae/scratch/root/root/lib/libCint.so
#24 0x00d9b3cc in G__exec_statement ()
from /afs/fanae/scratch/root/root/lib/libCint.so
#25 0x00d19b08 in G__exec_tempfile_core ()
from /afs/fanae/scratch/root/root/lib/libCint.so
#26 0x00d1ae43 in G__exec_tempfile_fp ()
from /afs/fanae/scratch/root/root/lib/libCint.so
#27 0x00dab7d4 in G__process_cmd ()
from /afs/fanae/scratch/root/root/lib/libCint.so
#28 0x007e3e8f in TCint::ProcessLine ()
from /afs/fanae/scratch/root/root/lib/libCore.so
#29 0x00750dea in TApplication::ProcessLine ()
from /afs/fanae/scratch/root/root/lib/libCore.so
#30 0x0011e1a8 in TRint::HandleTermInput ()
from /afs/fanae/scratch/root/root/lib/libRint.so
#31 0x0011c840 in TTermInputHandler::Notify ()
from /afs/fanae/scratch/root/root/lib/libRint.so
#32 0x0011ea56 in TTermInputHandler::ReadNotify ()
from /afs/fanae/scratch/root/root/lib/libRint.so
#33 0x0080f2aa in TUnixSystem::CheckDescriptors ()
from /afs/fanae/scratch/root/root/lib/libCore.so
#34 0x008134c8 in TUnixSystem::DispatchOneEvent ()
from /afs/fanae/scratch/root/root/lib/libCore.so
#35 0x007a66d0 in TSystem::InnerLoop ()
from /afs/fanae/scratch/root/root/lib/libCore.so
#36 0x007a6496 in TSystem::Run ()
from /afs/fanae/scratch/root/root/lib/libCore.so
#37 0x00750ed6 in TApplication::Run ()
from /afs/fanae/scratch/root/root/lib/libCore.so
#38 0x0011cfb2 in TRint::Run ()
from /afs/fanae/scratch/root/root/lib/libRint.so
#39 0x08048d36 in main ()

Thank you,

Lara

Hi Lara,

‘proofd’ is not needed at all, so please, make sure that it is stopped

/etc/init.d/proofd stop

and remove it from the list of init.d services restarted automatically (how to do this depends on the OS). The same for ‘rootd’.

I suggest to try to start the simplest PROOF cluster as possible, in particular without authentication at first.

Once that works, you can try to switch-on authentication. For that, these lines

xpd.seclib /afs/fanae/scratch/root/root/lib/libXrdSec.so
xpd.sec.protocol gsi -dlgpxy:1 -d:1 -certdir:/etc/grid-security/certificates -cert:/etc/grid-security/hostcert.pem -key:/etc/grid-security/hostkey.pem

are all what you need: authentication is driven by the server: when the client tries to connect the server tells her/him what she/he needs to provide (a valid proxy in this case); the client will then load the appropriate plug-ins to check for (and eventually initialize) a valid proxy certificate.
If the client certificate is in standard locations ($HOME/.globus) you do not need to set anything else.

Gerri

Hello again,

ok, I"m trying to run it without authentication. But I don’t know why when I start a new Proof session I get the following, I’m not even able to create it!

— Proofd: : GetNumCPUs: # of cores found: 1

*** Break *** segmentation violation
Generating stack trace…
0x045c1e67 in _ZN6TProof8AddInputEP7TObject + 0xf from /afs/fanae/scratch/root/root/lib/libProof.so
0x045d72e3 in ZN6TProof4InitEPKcS1_S1_iS1 + 0xf07 from /afs/fanae/scratch/root/root/lib/libProof.so
0x045db0b9 in _ZN6TProofC1EPKcS1_S1_iS1_P9TProofMgr + 0x169 from /afs/fanae/scratch/root/root/lib/libProof.so
0x045ee35a in _ZN9TProofMgr13CreateSessionEPKcS1_i + 0x6a from /afs/fanae/scratch/root/root/lib/libProof.so
0x045c8803 in _ZN6TProof4OpenEPKcS1_S1_i + 0x18b from /afs/fanae/scratch/root/root/lib/libProof.so
0x0462951d in from /afs/fanae/scratch/root/root/lib/libProof.so
0x00c4fa5f in _ZN4Cint19G__ExceptionWrapperEPFiP8G__valuePKcP8G__paramiES1_PcS5_i + 0x47 from /afs/fanae/scratch/root/root/lib/libCint.so
0x00d0c327 in G__call_cppfunc + 0x1c7 from /afs/fanae/scratch/root/root/lib/libCint.so
0x00cf18dd in G__interpret_func + 0x1abd from /afs/fanae/scratch/root/root/lib/libCint.so
0x00cdf9d4 in G__getfunction + 0x1ad0 from /afs/fanae/scratch/root/root/lib/libCint.so
0x00cc3cb8 in G__getitem + 0x3d8 from /afs/fanae/scratch/root/root/lib/libCint.so
0x00cc68ff in G__getexpr + 0x20e3 from /afs/fanae/scratch/root/root/lib/libCint.so
0x00cb616c in G__define_var + 0x1ac8 from /afs/fanae/scratch/root/root/lib/libCint.so
0x00d333cc in G__exec_statement + 0x38d0 from /afs/fanae/scratch/root/root/lib/libCint.so
0x00cb1b08 in from /afs/fanae/scratch/root/root/lib/libCint.so
0x00cb2e43 in G__exec_tempfile_fp + 0x13 from /afs/fanae/scratch/root/root/lib/libCint.so
0x00d437d4 in G__process_cmd + 0x14c8 from /afs/fanae/scratch/root/root/lib/libCint.so
0x0077be8f in _ZN5TCint11ProcessLineEPKcPN12TInterpreter10EErrorCodeE + 0x343 from /afs/fanae/scratch/root/root/lib/libCore.so
0x006e8dea in _ZN12TApplication11ProcessLineEPKcbPi + 0x62a from /afs/fanae/scratch/root/root/lib/libCore.so
0x001291a8 in _ZN5TRint15HandleTermInputEv + 0x1c8 from /afs/fanae/scratch/root/root/lib/libRint.so
0x00127840 in _ZN17TTermInputHandler6NotifyEv + 0x24 from /afs/fanae/scratch/root/root/lib/libRint.so
0x00129a56 in _ZN17TTermInputHandler10ReadNotifyEv + 0x12 from /afs/fanae/scratch/root/root/lib/libRint.so
0x007a72aa in _ZN11TUnixSystem16CheckDescriptorsEv + 0x1d2 from /afs/fanae/scratch/root/root/lib/libCore.so
0x007ab4c8 in _ZN11TUnixSystem16DispatchOneEventEb + 0x448 from /afs/fanae/scratch/root/root/lib/libCore.so
0x0073e6d0 in _ZN7TSystem9InnerLoopEv + 0x18 from /afs/fanae/scratch/root/root/lib/libCore.so
0x0073e496 in _ZN7TSystem3RunEv + 0x7e from /afs/fanae/scratch/root/root/lib/libCore.so
0x006e8ed6 in _ZN12TApplication3RunEb + 0x32 from /afs/fanae/scratch/root/root/lib/libCore.so
0x00127fb2 in _ZN5TRint3RunEb + 0x372 from /afs/fanae/scratch/root/root/lib/libRint.so
0x08048d36 in main + 0x52 from /afs/fanae/scratch/root/root/bin/root.exe
0x00452de3 in __libc_start_main + 0xd3 from /lib/tls/libc.so.6
0x08048c5d in _ZN15TApplicationImp11ShowMembersER16TMemberInspectorPc + 0x31 from /afs/fanae/scratch/root/root/bin/root.exe

Some weeks ago, I could use Proof without authentication without any problem but now with this new version don’t know what I’m doing wrong, but there is just errors :-s

I copied all the conf files to this new version…and I"m sure the ROOTSYS path points the right path…
It is really strange …

Lara

As you told me proofd is not running…
So, why this line???
— Proofd: : GetNumCPUs: # of cores found: 1

Lara

Hi Lara,

Yes, this is quite strange and from the trace back I do not see where it could come from.

So, I suggest to go step-by-step starting from a local installation and growing little-by-little.

To see if the basic PROOF functionality is working do the following:

  1. Go to the ROOT directory and set the environment
$cd  /afs/fanae/scratch/root/root
$source bin/thisroot.sh

(use bin/thisroot.csh for csh or tcsh).

  1. Go to the test directory and start a ROOT shell
$cd test
$root -l
  1. Run the stressProof test
root[]  .L stressProof.cxx
root [1] stressProof()

The result should be something like this (it should take a couple of minutes or so):

******************************************************************
*  Starting  P R O O F - S T R E S S  suite                      *
******************************************************************
*  Log file: /tmp/ganis/ProofStress_2Yt6mh
******************************************************************
 Test  1 : Open ............................................. OK *
 Test  2 : GetLogs .......................................... OK *
 Test  3 : Simple ........................................... OK *
 Test  4 : H1:http .......................................... OK *
* All registered tests have been passed  :-)                     *
******************************************************************

If this works on, let’s say, the client machine, the master machine and one worker machine , then we move on to setup the simplest cluster on your machines.
For that send me the names and roles of the machines, the location of ROOT (if different from /afs/fanae/scratch/root/root ) and the location of the working directories.
I will prepare the simplest config files to setup the cluster and then we will start it up in debug mode to understand what’s going on, step by step.

Cheers, Gerri

Hello,

I did it and I only get:


  • Starting P R O O F - S T R E S S suite *

  • Log file: /tmp/ProofStress_6DcRf2

Test 1 : Open .

And then it comes back to the command line …:-s

The root path is /afs/fanae/scratch/root/root

The roles are:

master fanae41.geol.uniovi.es
worker fanae23.geol.uniovi.es

(Let’s try just with one machine)

and the working spaces are:

/scratch/proofpool
/scratch/proofbox
/scratch/cache

Lara