The DASK version is 2022.6.1
There are two types of warnings/errors I see in logs. Sorry, I’m not sure if these are helpful.
This is a small background ntuple. In the end, I need to reduce to only 5 worker nodes.
The majority are
2022-11-04 15:56:49,777 - distributed.worker - WARNING - Compute Failed
Key: dask_mapper-48e62520-a687-4bca-ae1a-60eba0dc167a
Function: execute_task
args: ((<function DaskBackend.ProcessAndMerge.<locals>.dask_mapper at 0x7f8f54b160d0>, (<function apply at 0x7f8fc96fcb80>, <class 'DistRDF.Ranges.TreeRangePerc'>, (), (<class 'dict'>, [['id', 38], ['treenames', ['nominal_Loose']], ['filenames', ['/gpfs/slac/atlas/fs1/d/yuchou/H2a4b/Level2/supermerged_SOLARBv6p9//2ji3bobji/single_top_mc16d_Wt_FS.root']], ['first_file_idx', 0], ['last_file_idx', 1], ['first_tree_start_perc', 0.42500000000000004], ['last_tree_end_perc', 0.4624999999999999], ['friendinfo', None]]))))
kwargs: {}
Exception: "AttributeError('__enter__')"
I also see one error as follow
2022-11-04 15:57:04,206 - distributed.worker - ERROR - failed during get data with tcp://134.79.21.47:36421 -> tcp://134.79.21.56:35747
Traceback (most recent call last):
File "/cvmfs/sft-nightlies.cern.ch/lcg/views/dev3/Fri/x86_64-centos7-gcc11-opt/lib/python3.9/site-packages/distributed/comm/tcp.py", line 229, in read
frames_nbytes = await stream.read_bytes(fmt_size)
tornado.iostream.StreamClosedError: Stream is closed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/cvmfs/sft-nightlies.cern.ch/lcg/views/dev3/Fri/x86_64-centos7-gcc11-opt/lib/python3.9/site-packages/distributed/worker.py", line 1674, in get_data
response = await comm.read(deserializers=serializers)
File "/cvmfs/sft-nightlies.cern.ch/lcg/views/dev3/Fri/x86_64-centos7-gcc11-opt/lib/python3.9/site-packages/distributed/comm/tcp.py", line 245, in read
convert_stream_closed_error(self, e)
File "/cvmfs/sft-nightlies.cern.ch/lcg/views/dev3/Fri/x86_64-centos7-gcc11-opt/lib/python3.9/site-packages/distributed/comm/tcp.py", line 150, in convert_stream_closed_error
raise CommClosedError(f"in {obj}: {exc}") from exc
distributed.comm.core.CommClosedError: in <TCP (closed) local=tcp://134.79.21.47:36421 remote=tcp://134.79.21.56:58968>: Stream is closed
2022-11-04 15:57:04,208 - distributed.core - INFO - Lost connection to 'tcp://134.79.21.56:58968'
Traceback (most recent call last):
File "/cvmfs/sft-nightlies.cern.ch/lcg/views/dev3/Fri/x86_64-centos7-gcc11-opt/lib/python3.9/site-packages/distributed/comm/tcp.py", line 229, in read
frames_nbytes = await stream.read_bytes(fmt_size)
tornado.iostream.StreamClosedError: Stream is closed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/cvmfs/sft-nightlies.cern.ch/lcg/views/dev3/Fri/x86_64-centos7-gcc11-opt/lib/python3.9/site-packages/distributed/core.py", line 777, in _handle_comm
result = await result
File "/cvmfs/sft-nightlies.cern.ch/lcg/views/dev3/Fri/x86_64-centos7-gcc11-opt/lib/python3.9/site-packages/distributed/worker.py", line 1674, in get_data
response = await comm.read(deserializers=serializers)
File "/cvmfs/sft-nightlies.cern.ch/lcg/views/dev3/Fri/x86_64-centos7-gcc11-opt/lib/python3.9/site-packages/distributed/comm/tcp.py", line 245, in read
convert_stream_closed_error(self, e)
File "/cvmfs/sft-nightlies.cern.ch/lcg/views/dev3/Fri/x86_64-centos7-gcc11-opt/lib/python3.9/site-packages/distributed/comm/tcp.py", line 150, in convert_stream_closed_error
raise CommClosedError(f"in {obj}: {exc}") from exc
distributed.comm.core.CommClosedError: in <TCP (closed) local=tcp://134.79.21.47:36421 remote=tcp://134.79.21.56:58968>: Stream is closed