Issue running dask-connected RDataFrame example

Pedro_Silva · April 27, 2023, 12:02am

Dear experts
i was trying to run the dask connected example [1] on lxplus after sourcing the LCG_102b_swan environment [2].
The default (multithreaded/local cluster) runs just fine, but not the HTCondorCluster version.
I get a long list of errors [3]. Naively it looks like it’s not parsing/identifying correctly the condor security version to use?
Would anyone have a suggestion?
thanks,
Pedro

[1] ROOT: tutorials/dataframe/distrdf002_dask_connection.py File Reference
[2] /cvmfs/sft.cern.ch/lcg/views/LCG_102b_swan/x86_64-centos7-gcc11-opt/setup.sh
[3] Stack trace

cvmfs/sft.cern.ch/lcg/views/LCG_102b_swan/x86_64-centos7-gcc11-opt/lib/python3.9/site-packages/dask_jobqueue/core.py:20: FutureWarning: tmpfile is deprecated and will be removed in a future release. Please use dask.utils.tmpfile instead.
  from distributed.utils import tmpfile
Task exception was never retrieved
future: <Task finished name='Task-25' coro=<_wrap_awaitable() done, defined at /cvmfs/sft.cern.ch/lcg/releases/Python/3.9.12-9a1bc/x86_64-centos7-gcc11-opt/lib/python3.9/asyncio/tasks.py:681> exception=RuntimeError('Command exited with non-zero exit code.\nExit code: 1\nCommand:\ncondor_submit /tmp/psilva/tmp1trm9px2.sh\nstdout:\n\nstderr:\nError: use security:recommended_v9_0: does not recognise recommended_v9_0\nConfig source Error "/etc/condor/config.d/00-htcondor-9.0.config", Line 26: at use security:recommended_v9_0:recommended_v9_0\nConfiguration Error Line 26 while reading config source /etc/condor/config.d/00-htcondor-9.0.config\n\n')>
Traceback (most recent call last):
  File "/cvmfs/sft.cern.ch/lcg/releases/Python/3.9.12-9a1bc/x86_64-centos7-gcc11-opt/lib/python3.9/asyncio/tasks.py", line 688, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/cvmfs/sft.cern.ch/lcg/views/LCG_102b_swan/x86_64-centos7-gcc11-opt/lib/python3.9/site-packages/distributed/deploy/spec.py", line 63, in _
    await self.start()
  File "/cvmfs/sft.cern.ch/lcg/views/LCG_102b_swan/x86_64-centos7-gcc11-opt/lib/python3.9/site-packages/dask_jobqueue/core.py", line 325, in start
    out = await self._submit_job(fn)
  File "/cvmfs/sft.cern.ch/lcg/views/LCG_102b_swan/x86_64-centos7-gcc11-opt/lib/python3.9/site-packages/dask_jobqueue/core.py", line 308, in _submit_job
    return self._call(shlex.split(self.submit_command) + [script_filename])
  File "/cvmfs/sft.cern.ch/lcg/views/LCG_102b_swan/x86_64-centos7-gcc11-opt/lib/python3.9/site-packages/dask_jobqueue/core.py", line 403, in _call
    raise RuntimeError(
RuntimeError: Command exited with non-zero exit code.
Exit code: 1
Command:
condor_submit /tmp/psilva/tmp1trm9px2.sh
stdout:

stderr:
Error: use security:recommended_v9_0: does not recognise recommended_v9_0
Config source Error "/etc/condor/config.d/00-htcondor-9.0.config", Line 26: at use security:recommended_v9_0:recommended_v9_0
Configuration Error Line 26 while reading config source /etc/condor/config.d/00-htcondor-9.0.config


Task exception was never retrieved

ROOT Version: 6.26/08
Platform: x86_64-centos7
Compiler: gcc 11.3.0

couet · April 27, 2023, 7:29am

I guess @eguiraud can help

eguiraud · April 27, 2023, 2:46pm

Hi Pedro, @couet ,

I’m afraid the distributed RDF expert is @vpadulan , who is currently travelling, but let’s ping him and hope he will be able to reply soon.

However I am not sure we ever encountered something like that. Does it also happen if you just use dask+HTCondor without distributed RDF?

Cheers,
Enrico

Pedro_Silva · May 3, 2023, 12:54pm

Hi,
thanks both for the feedback.
Unfortunately i’m on a learning curve here and been using the tutorials as the starting point for the possible variations.
I have been using as reference:

distrdf001_
- swan/lxplus : no problem in both cases
distrdf002_
- swan - HTCondor does not support EOS-based cfg files
- lxplus - i get the security:recommended_v9_0 type of error mentioned initially

Hence when you write “dask+HTCondor without distributed RDF” would you have an example?
Sorry if i’m overlooking
cheers
pedro

Pedro_Silva · May 16, 2023, 11:08am

Hello,
would you have any news on this front?
thanks a lot,
Pedro

Enrico Guiraud via ROOT Forum <root.discourse@cern.ch> escreveu no dia quinta, 27/04/2023 à(s) 16:56:

eguiraud · May 16, 2023, 2:22pm

Hi @Pedro_Silva ,

thank you for the ping, let’s see if @vpadulan can help now that he’s back

Cheers,
Enrico

vpadulan · May 16, 2023, 4:25pm

Hi @Pedro_Silva ,

Very sorry for taking so long in replying, I lost the earlier pings so thank you for pinging again.

I am glad you are trying out distributed RDataFrame and you had a few working examples! Now, for the HTCondorCluster issues, that actually has little to do with ROOT itself. The HTCondorCluster instance has to be configured properly to connect to the condor pools at CERN. The configuration is not exactly trivial, in fact the team managing the batch systems has developed a package to make the experience easier with a class called CernCluster. You can get more info here. Let me know if that works better.

Cheers,
Vincenzo

system · May 30, 2023, 4:25pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.