Handling Missing Input Files in RDF with xrootd

Dear ROOT experts,

I have a question regarding RDF and handling missing input files. In my work, I sometimes use data files that are stored remotely and accessed via xrootd. I create an RDataFrame from a TChain, and when adding files to the TChain, I check their availability.

Occasionally, files are available when they are added to the TChain but become unavailable at the time of analysis (e.g., when I run hist = df.Histo1D(("", "", 10, -.5, 9.5), "dummy") and hist.Draw()). I suspect this issue might be related to xrootd instabilities.

Here is a toy example to reproduce the problem.

import ROOT
from array import array
import os

# create dummy input files

for i in range(10):

    file = ROOT.TFile(f"output_{i}.root", "RECREATE")
    tree = ROOT.TTree("tree", "A simple TTree")

    dummy_value = array('i', [0])
    tree.Branch("dummy", dummy_value, "dummy/I")

    for _ in range(1000):
        dummy_value[0] = i
        tree.Fill()

    file.Write()
    file.Close()

## check if files are available when they are added to the TChain
chain = ROOT.TChain("tree")
for i in range(10):
    if not chain.Add(f"output_{i}.root",0):
        raise OSError(f"file output_{i}.root is missing")

df = ROOT.RDataFrame(chain)

# delete one input file
# it simulates the xrootd instability
os.remove("output_8.root")

hist = df.Histo1D(("", "", 10, -.5, 9.5),"dummy")
canvas = ROOT.TCanvas("canvas", "Histogram Canvas", 800, 600)
canvas.Update()
hist.Draw()
canvas.Draw()

ROOT printed warning Error in <TFile::TFile>: file output_8.root does not exist, but the histogram is created from the other available files.

As far as I understand, ROOT’s internal mechanisms print warnings in such cases without raising a Python exception by default.

When I run the RDF analysis on the cluster, it appears that all jobs finish properly, but I might still be missing some input files.

I might be missing something basic, but is there a way to force a Python exception when this happens? Or some workaround?

Thanks a lot for your advice.

Best,
Jindrich

Hi Jindrich,

Thanks for the post and welcome to the ROOT Community!
In order to abort in presence of an error like the one you can do gErrorAbortLevel = kError.
However, the problem I see here is that you are not able to access some files, randomly, during your event loop. This may be due to “instabilities in xrootd”, but more likely to instabilities to he storage backend you are accessing (or a combination of the two). My advice would be to get in touch with the admins of the site you are relying on (University cluster? Tier-2/1?) and signal to them this problem.

I hope this helps a bit.

Cheers,
Danilo