I was trying to add an integration test case for our bamboo package with the new ROOT 6.28, but the tests fail with a segmentation fault.
Bamboo uses RDataFrame to fill histograms and skims. The segfault happens after the event loop has been triggered and is finished and the results have been retrieved and written to a file.
The strange thing is that no stacktrace appears, which makes it quite difficult to debug: all I see is:
*** Break *** segmentation violation
after which execution just hangs forever (the CI/CD tests don’t event fail, they just timeout!)…
The tests in question worked perfectly fine up to now, e.g. with ROOT 6.26.04.
I don’t see anything obvious from the 6.28 release notes that would require changes from our side… Any pointers as to what might be going on would be appreciated!
As it s a problem related to this “bamboo” software you might ask the developers/maintainers of this software. But as it uses RDataFrame, @eguiraud might have some ideas about it.
Actually, after adding the newly available CLING_DEBUG=1, I got more information:
cppyy.ll.SegmentationViolation: Template method resolution failed:
static unique_ptr<rdfhelpers::PrintProgress,default_delete<rdfhelpers::PrintProgress> > rdfhelpers::PrintProgress::addToNode(ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> df, int printFreq, int nThreads = 1) =>
SegmentationViolation: segfault in C++; program state was reset
static unique_ptr<rdfhelpers::PrintProgress,default_delete<rdfhelpers::PrintProgress> > rdfhelpers::PrintProgress::addToNode(ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>* df, int printFreq, int nThreads = 1) =>
SegmentationViolation: segfault in C++; program state was reset
static unique_ptr<rdfhelpers::PrintProgress,default_delete<rdfhelpers::PrintProgress> > rdfhelpers::PrintProgress::addToNode(ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> df, int printFreq, int nThreads = 1) =>
SegmentationViolation: segfault in C++; program state was reset
While it’s still not clear to me why that gets broken all of a sudden, at least I have something to go by…
EDIT: removing the PrintProgress::addToNode call, I get:
cppyy.ll.SegmentationViolation: TH1D& ROOT::RDF::RResultPtr<TH1D>::operator*() =>
SegmentationViolation: segfault in C++; program state was reset
which is a segfault from the event loop itself…
However, is it expected that after the segfault the crashed program doesn’t exit and just hangs forever waiting for a stracktrace?
You could try running a debug build of v6.28.02 with the environment variable CLING_DEBUG=1 set and see whether you get a more helpful stacktrace – other than that, I would need a way to reproduce this to debug.
Bamboo has the possibility of writing out the generated RDF analysis to a C++ file, compile it, and run it: when doing that, things work fine. So the issue must come from something that is instanciated with pyROOT or with the jitting…
That’s often caused by gdb becoming unhappy, somehow: we attach gdb to the running process and have it spit out the backtrace of all threads.
When it hangs you can do ps -feH to see how ROOT fires up gdb, and you should be able to run it by hand and see why “bt” (i.e. backtrace) doesn’t work by gdb…