Root error when using sbatch with RDataFrame


ROOT Version: 6.24/06
Compiler: gcc (GCC) 9.3.0


Hi! My project requires me to submit sbatch jobs that each are supposed to run Python code that uses the RDataFrame (RDF) library. I provide a filelist, which contains a list of root files that the Python file opens and then performs RDF operations on the RDF object. (If needed: the individual TTrees are added to a single TChain object which is then converted to an RDF object)

When I submit (interactive) jobs using salloc, everything works fine. However, when I submit my list of sbatch jobs, I run into an error, “cannot stat the file”. Specifically:

Error in <TFile::GetSize>: cannot stat the file /root_file_address/root_file.root
Error in <TFile::Init>: cannot stat the file /root_file_address/root_file.root

[A Python traceback in between]

cppyy.gbl.std.runtime_error: ROOT::RDF::RCutFlowReport& ROOT::RDF::RResultPtr<ROOT::RDF::RCutFlowReport>::operator*() => runtime_error: TTreeProcessorMT::Process: an error occurred while opening file "/root_file_address/root_file.root"

How I am submitting the jobs:
I loop over the jobs I want to submit, each time running the command
sbatch --export=NONE --time=03:00:00 --cpus-per-task=12 --mem-per-cpu=16000M --account=acc_name $title.sh

where $title.sh is a bash file that sets up the root environment and then runs my Python file.

Crucially, these errors do not occur when I individually run the program by first using salloc to get a job, setting up the root environment and then running the program manually. It is also interesting that for different sbatch submissions (over the same filelist), I run into the same error but for different root files.

Appreciate any help, thank you!

Hello, welcome to the ROOT forum!

The cannot stat the file tells you that the file could not opened correctly, for one reason or another.

This is probably more a problem with your slurm environment than related to RDataFrame? Maybe the paths are different for sbatch job than for interactive sessions?

In any case, I would advise you to get help elsewhere, like from your local colleagues that also have access to the same job submission environment. Here on the ROOT forum, I don’t think anyone can help you because we don’t have the same environment so it’s practically impossible to reproduce the problem :frowning:

Maybe one thing you could try to pin down the problem is to create jobs that just open the files without creating any RDataFrame. If it still fails, you can exclude that the problem is actually related to RDataFrame.

Cheers,
Jonas

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.