is a minimal reproducer that uses data in the Grid, therefore you might need a grid certificate to run it. It seems to get stuck when this particular combination of:
XROOTD line above.
Data in the grid.
Dataframe
are used. This caused our grid jobs to just wait, when jobs get stuck for around an hour (i.e. the process waits in status S) these jobs get killed and the log files do not get saved. That made this particular problem extremely time consuming to solve.
Now that we know the source of the problem, we will just drop the XROOTD line; thus, a solution is not urgently needed. In any case we thought of letting you know.
Hi @rooter_03 ,
thank you for the report, this is not a known issue.
If I understand correctly removing from XRootD import client is a workaround? That’s very surprising.
Also do you need RDataFrame, or is TFile enough to reproduce the isssue (RDataFrame does not do anything special, it uses TChain and TFile to access the files under the hood)? For example, do these work or hang?
import ROOT
from XRootD import client
filename='root://xrootd.echo.stfc.ac.uk/lhcb:user/lhcb/user/a/acampove/GangaFiles_11.18_Wednesday_27_October_2021/2011_skimmed.root'
f = ROOT.TFile.Open(filename)
t = f.Get('gen/truth')
print(t.GetEntries())
or
import ROOT
from XRootD import client
filename='root://xrootd.echo.stfc.ac.uk/lhcb:user/lhcb/user/a/acampove/GangaFiles_11.18_Wednesday_27_October_2021/2011_skimmed.root'
c = ROOT.TChain('gen/truth')
c.Add(filename)
print(c.GetEntries())
Surprisingly, TFile is enough to reproduce the issue, as long as you do not call TFile::Close. If this function is called, the problem does not happen. The problem does happen with TChain too. Maybe it has to do with the file not been closed explicitly.
A colleague found out that the problem goes away if the lines: