!allNullFiles during lazy Snapshot execution in RunGraphs

apetukho · April 1, 2022, 9:45am

Dear ROOT experts,

I’ve got a script that creates RResultHandles to the lazy snapshots and then runs them using RDF::RunGraphs with EnableImplicitMT(). The code is written in python and run on the lxplus, the data is stored on the /eos/.

I’ve started running into the following error messages during RDF::RunGraphs execution with no idea what could’ve caused them and how to avoid them

Fatal: !allNullFiles violated at line 1582 of `/build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.24.06/src/ROOT-6.24.06-build/include/ROOT/RDF/ActionHelpers.hxx

It would be difficult to use previous versions of ROOT because (as I understand it) RDF::RunGraphs was made public and not experimental only in 6.24. Trying newer versions would also be difficult, since 6.24 is the latest available on the lxplus and the files being too big to test it locally.

How could I troubleshoot this errors?

Thanks in advance,
Aleksandr

ROOT Version: 6.24.06
Platform: lxplus

eguiraud · April 1, 2022, 12:05pm

Hi @apetukho,
that might happen if Snapshot didn’t actually manage to write any entries, either because there were no input entries or because, after Filters, there were no output entries. Or, of course, in case of obscure bugs that eluded our testing infrastructure!

Can you try substituting the Snapshot call with a Count call and check how many entries are counted? Is it more than 0?

Cheers,
Enrico

apetukho · April 1, 2022, 1:03pm

Hi @eguiraud .
I’ve run a few tests and here’s what I’ve found.

I’ve checked the code without EnableImplicitMT() and everything works fine, albeit slow.
I’m using the Snapshot on 84 files simultaneously. Out of them 23 files end up with no events, but there are 534 entries that pass all of the selection across all files.
Version with the Snapshot replaced with Count works with EnableImplicitMT().

eguiraud · April 1, 2022, 1:13pm

Uhm, in that case that assert you see should not be triggered. Your other observations are in line with expectations. Any chance you can provide a reproducer that I can debug? E.g. share data and code with me privately via e.g. cernbox?

EDIT:
Count().GetValue() returns something larger than 0 right?

eguiraud · April 6, 2022, 4:29pm

Hi Aleksandr,
thank you very much for the reproducer. There is some extra cruft, but I can see that indeed passedEntryNumSample (i.e. the number of entries that pass all selections) is 0 for 2 out of the 3 files in the reproducer. That’s why that assert fires.

The good news is that in v6.26 we switched to just a warning, so upgrading your ROOT version should result in a warning + an empty file instead of an assert that aborts execution.

Cheers,
Enrico

apetukho · April 7, 2022, 2:51pm

Hi Enrico,

since thee v6.26 is not yet available on lxplus, is there a way to use this kind of script with the multithreading enabled? Because there is no problem with the single thread execution.

Best regards,
Aleksandr

eguiraud · April 8, 2022, 8:29am

I’m afraid that for v6.24 all I can think of is either exclude those files that produce empty outputs from the processing, or not use multi-threading. An LCG release for v6.26 should be out soon, in the meanwhile maybe you can use a conda environment or one of the other installation methods.

Cheers,
Enrico

system · April 22, 2022, 8:30am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.