With multithreading enabled, there is a high chance (but not always) of getting an empty stack error when attempting RDataFrame Snapshot with multiple input root files in which some of root files may have bad clustering / clustering size.
The error is as follows
Fatal: !fStack.empty() && "Trying to pop a slot from an empty stack!" violated at line 33 of `/home/conda/feedstock_root/build_artifacts/root_1562589567833/work/root-source/tree/dataframe/src/RSlotStack.cxx'
Condition for error to occur:
Enabled Multithreading
Included multiple root files in RDataFrame
Some of root files may have bad clustering / clustering size (Not entirely sure this is a necessary condition for triggering the error)
A minimal reproducing code is
ROOT::EnableImplicitMT();
for (UInt_t i = 0 ; i < 100; i++){
std::vector<std::string> input_files;
input_files.emplace_back("test1.root");
input_files.emplace_back("test2.root");
ROOT::RDataFrame("MonoH_Nominal",input_files).Snapshot("test","test.root");
}
The clustering information for the root files concerned is
For test1.root
Cluster Range # Entry Start Last Entry Size
0 0 105994 7976
For test2.root
Cluster Range # Entry Start Last Entry Size
0 0 2883 0
Hi @AlkaidC,
thank you for the clear reproducer, we’ll take a look at it as soon as possible.
As you are running on lxplus, could you check whether you see this problem with a nightly build of ROOT master branch? Here are instructions to setup the nightlies from cvmfs.
Cheers,
Enrico
P.S.
just to be clear: users should never see the error message you posted, independently of how messed up the input files are: this is a RDataFrame bug.
Hi @AlkaidC,
thank you for double-checking! This is clearly a bug in RDataFrame. The cause is clear (although a bit technical, see my write-up in the jira ticket) but the fix requires a bit of work. Please ping me or @Axel if this is a blocking issue for you, it might help move it up the to-do list.
Hi @AlkaidC,
I just merged a patch that should resolve the crash in your case and largely mitigate the problem in RDF.
It would be great if tomorrow/next week you could try to reproduce the crash with a nightly build of ROOT again. Hopefully you will not see any issue anymore.
@AlkaidC You probably have a different TBB enabled with $LD_LIBRARY_PATH than the one from LCG. Can you please paste here the output of echo $LD_LIBRARY_PATH | tr ':' '\n'?
About the CVMFS nightlies, turns out there is a bit of an issue – sorry about that. One workaround to make them work is to first source the last LCG release, then the nightly, and it works: