Crash in MultiThread and RDF

vberta · June 11, 2019, 4:02pm

_ROOT Version: sft-nightlies.cern.ch/lcg/views/dev3/latest/x86_64-centos7-gcc62-opt
_Platform: CentOS Linux release 7.6.1810 (Core)
_Compiler: gcc version 6.2.0 (GCC)

Hi,
I’m processing data with a RDF framework using PyROOT. I’m sourcing the latest nightly ROOT version because I need some features that are not present in the last stable ROOT release. When I enable the MT with:
ROOT.ROOT.EnableImplicitMT(24)
it crashes with the backtrace in the log file I attached. The crash happens when the function Write() is called on a histogram. I’ve checked that this histogram exists, but I noticed that also call GetName() make the framework crash. Without the MT option, my framework is working. LOGCrash_slc7_bis.txt (369.3 KB)

eguiraud · June 11, 2019, 4:31pm

Hi,
are you mixing python threading with ROOT threading?
Note that the stacktrace contains 64 threads, not 24 – is it the same crash?

The crash happens before the event loop is started, during setup of the per-thread TTreeReaders.
You see it when you access the histogram because RDF triggers the event loop upon first access to one of the results.
Is it possible that one of the TTrees in your TChain is either empty or has less branches than the first tree in the chain?

Lastly: can you try running with a -dbg build instead of a -opt build? It should give more complete stacktraces – currently the relevant part of the stacktrace is:

#6  0x00007f15cf003f11 in ROOT::Internal::RDF::RColumnValue<int>::MakeProxy(TTreeReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /cvmfs/sft-nightlies.cern.c
#7  0x00007f15cc18bb73 in ?? ()                                                                                         
#8  0x0000000000000001 in ?? ()                                                                                         
#9  0x00007f15e2840689 in ?? ()                                                                                         
#10 0x00007f15c9673440 in ?? ()                                                                                         
#11 0x00007f15c9673440 in ?? ()                                                                                         
#12 0x00007f15c96734a0 in ?? ()                                                                                         
#13 0x00007f15c96734a4 in ?? ()                                                                                         
#14 0x0000000000000001 in ?? ()                                                                                         
#15 0x00007f15cc195020 in ?? ()                                                                                         
#16 0x00007f15cf005770 in ?? () from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev3/Tue/ROOT/HEAD/x86_64-centos7-gcc62-opt/lib/libROOTDataFrame.so
#17 0x00000000a6546350 in ?? ()                                                                                         
#18 0x00007f1598006f00 in ?? ()                                                                                         
#19 0x00007f15cf003eb0 in ?? () from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev3/Tue/ROOT/HEAD/x86_64-centos7-gcc62-opt/lib/libROOTDataFrame.so
#20 0x0000000095a8d5d0 in ?? ()                                                                                         
#21 0x00007f1598006f00 in ?? ()                                                                                         
#22 0x0000000000000000 in ?? ()

Cheers,
Enrico

vberta · June 12, 2019, 1:58pm

Hi,
Thank you for your reply. I noticed the 64 threads, and I don’t have an explanation (yes it is the same crash). I tried to use -dbg build I attach the log [***]. I also tried with a single tree input, to exclude a branch number issue and the result is attached in the “_1tree” logLOGCrash_slc7_dbg_1tree.txt (558.7 KB)
LOGCrash_slc7_dbg.txt (722.1 KB)

Cheers,
Valerio
[***] actually sourcing the -dbg build there is no ROOT package, so I sourced first the -opt and then the -dbg. Is it enough?

eguiraud · June 12, 2019, 2:33pm

Hi Valerio,
those stacktraces don’t look right
We need to run the reproducer on a build with debug symbols.

Please open a bug report on jira attaching a minimal reproducer and, possibly, some data (even just a few entries) that we can run it on. That would greatly help. Without a reproducer or a clear stacktrace to point at the problem, there is not much we can do.

Cheers,
Enrico

vberta · June 19, 2019, 10:45am

Hi Enrico,
In the writing of the reproducer tor the bug report we discovered that it was an inssue of our framework only, and we solved it, connected with the enabling and disabling of MT that probably produce this anomalous number of threads.
Thank you very much for your help.
Cheers,
Valerio

system · July 3, 2019, 10:45am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.