Rdf breaks with chain friends

ROOT Version: git v6-24-00-patches @5af1fa4d3d
Platform: fedora 34

So, adding successfully some chain friends to the chain to be processed (even if they are not used)
breaks the RDF machinery, see below [1]
the code, with the section that generate the crash (and commented out works without problem):

Any idea about this?
Thanks a lot!

[1]

*** Break *** segmentation violation



===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
rax 0xfffffffffffffe00   rbx 0x0000000000000000      rcx 0x00007f8c4c82db0f
rdx 0x0000000000000000   rsi 0x00007ffd2ca54488      rdi 0x000000000000f83f
rbp 0x00007ffd2ca54488   rsp 0x00007ffd2ca54450       r8 0x0000000000000000
r9 0x0000000000000000   r10 0x0000000000000000      r11 0x0000000000000293
r12 0x00007ffd2ca544f0   r13 0x0000000000000001      r14 0x00007ffd2ca55820
r15 0x000055e7e4c10370   rip 0x00007f8c4c82db0f   eflags [ CF AF SF IF ]   
cs 0x00000033            ss 0x0000002b               ds 0x00000000        
es 0x00000000            fs 0x00000000               gs 0x00000000        

Thread 2 (Thread 0x7f8c31af1640 (LWP 63548) "python3"):
#0  0x00007f8c4c755a8a in __futex_abstimed_wait_common64 (futex_word=futex_word
entry=0x7f8c4cc62f6c <_PyRuntime+428>, expected=expected
entry=0, clockid=clockid
entry=1, abstime=abstime
entry=0x7f8c31af05f0, private=private
entry=0, cancel=cancel
entry=true) at ../sysdeps/nptl/futex-internal.c:74
#1  0x00007f8c4c755aef in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word
entry=0x7f8c4cc62f6c <_PyRuntime+428>, expected=expected
entry=0, clockid=clockid
entry=1, abstime=abstime
entry=0x7f8c31af05f0, private=private
entry=0) at ../sysdeps/nptl/futex-internal.c:123
#2  0x00007f8c4c74f5c4 in __pthread_cond_wait_common (abstime=0x7f8c31af05f0, clockid=1, mutex=0x7f8c4cc62f70 <_PyRuntime+432>, cond=0x7f8c4cc62f40 <_PyRuntime+384>) at pthread_cond_wait.c:504
#3  __pthread_cond_timedwait (cond=0x7f8c4cc62f40 <_PyRuntime+384>, mutex=0x7f8c4cc62f70 <_PyRuntime+432>, abstime=0x7f8c31af05f0) at pthread_cond_wait.c:637
#4  0x00007f8c4ca28f59 in take_gil () at /lib64/libpython3.9.so.1.0
#5  0x00007f8c4ca59356 in PyEval_RestoreThread () at /lib64/libpython3.9.so.1.0
#6  0x00007f8c4cb21877 in time_sleep () at /lib64/libpython3.9.so.1.0
#7  0x00007f8c4ca4636b in cfunction_vectorcall_O () at /lib64/libpython3.9.so.1.0
#8  0x00007f8c4ca3f05e in _PyEval_EvalFrameDefault () at /lib64/libpython3.9.so.1.0
#9  0x00007f8c4ca3900d in _PyEval_EvalCode () at /lib64/libpython3.9.so.1.0
#10 0x00007f8c4ca46cee in _PyFunction_Vectorcall () at /lib64/libpython3.9.so.1.0
#11 0x00007f8c4ca3d16e in _PyEval_EvalFrameDefault () at /lib64/libpython3.9.so.1.0
#12 0x00007f8c4ca46fe3 in function_code_fastcall () at /lib64/libpython3.9.so.1.0
#13 0x00007f8c4ca3a5eb in _PyEval_EvalFrameDefault () at /lib64/libpython3.9.so.1.0
#14 0x00007f8c4ca46fe3 in function_code_fastcall () at /lib64/libpython3.9.so.1.0
#15 0x00007f8c4ca3a5eb in _PyEval_EvalFrameDefault () at /lib64/libpython3.9.so.1.0
#16 0x00007f8c4ca46fe3 in function_code_fastcall () at /lib64/libpython3.9.so.1.0
#17 0x00007f8c4ca4f442 in method_vectorcall () at /lib64/libpython3.9.so.1.0
#18 0x00007f8c4cafee0a in t_bootstrap () at /lib64/libpython3.9.so.1.0
#19 0x00007f8c4cafed78 in pythread_wrapper () at /lib64/libpython3.9.so.1.0
#20 0x00007f8c4c749299 in start_thread (arg=0x7f8c31af1640) at pthread_create.c:481
#21 0x00007f8c4c861353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f8c4c5eb740 (LWP 63526) "python3"):
#0  0x00007f8c4c82db0f in __GI___wait4 (pid=63551, stat_loc=stat_loc
entry=0x7ffd2ca54488, options=options
entry=0, usage=usage
entry=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
#1  0x00007f8c4c82da8b in __GI___waitpid (pid=<optimized out>, stat_loc=stat_loc
entry=0x7ffd2ca54488, options=options
entry=0) at waitpid.c:38
#2  0x00007f8c4c7ab09b in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:172
#3  0x00007f8c4ba051f4 in TUnixSystem::Exec(char const*) (shellcmd=<optimized out>, this=0x55e7e4c10370) at /home/physics-tools/root_git/core/unix/src/TUnixSystem.cxx:2120
#4  TUnixSystem::StackTrace() (this=0x55e7e4c10370) at /home/physics-tools/root_git/core/unix/src/TUnixSystem.cxx:2411
#5  0x00007f8c4bda9233 in (anonymous namespace)::do_trace (sig=1) at /home/physics-tools/root_git/bindings/pyroot/cppyy/cppyy-backend/clingwrapper/src/clingwrapper.cxx:182
#6  (anonymous namespace)::TExceptionHandlerImp::HandleException(Int_t) (this=<optimized out>, sig=1) at /home/physics-tools/root_git/bindings/pyroot/cppyy/cppyy-backend/clingwrapper/src/clingwrapper.cxx:195
#7  0x00007f8c4ba02659 in TUnixSystem::DispatchSignals(ESignals) (this=0x55e7e4c10370, sig=kSigSegmentationViolation) at /home/physics-tools/root_git/core/unix/src/TUnixSystem.cxx:3644
#8  <signal handler called> () at ../sysdeps/unix/sysv/linux/sigaction.c
#9  (anonymous namespace)::GetFriendEntries (friendFileNames=std::vector of length 3, capacity 4 = {...}, friendNames=std::vector of length 3, capacity4 = {...}) at /home/physics-tools/root_git/tree/treeplayer/src/TTreeProcessorMT.cxx:239
#10 ROOT::TTreeProcessorMT::Process(std::function<void (TTreeReader&)>) (this=this
entry=0x55e7ef54b8a0, func=...) at /home/physics-tools/root_git/tree/treeplayer/src/TTreeProcessorMT.cxx:604
#11 0x00007f8c30bb0232 in ROOT::Detail::RDF::RLoopManager::RunTreeProcessorMT() (this=0x55e7ee4ebb60) at /home/physics-tools/root_git/tree/dataframe/src/RLoopManager.cxx:423
#12 0x00007f8c30bb1545 in ROOT::Detail::RDF::RLoopManager::Run() (this=0x55e7ee4ebb60) at /home/physics-tools/root_git/tree/dataframe/src/RLoopManager.cxx:697
#13 0x00007f8c2040901f in  ()
#14 0x0000000000000000 in  ()
===========================================================


The lines below might hint at the cause of the crash.
You may get help by asking at the ROOT forum https://root.cern.ch/forum
Only if you are really convinced it is a bug in ROOT then please submit a
report at https://root.cern.ch/bugs Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#9  (anonymous namespace)::GetFriendEntries (friendFileNames=std::vector of length 3, capacity 4 = {...}, friendNames=std::vector of length 3, capacity4 = {...}) at /home/physics-tools/root_git/tree/treeplayer/src/TTreeProcessorMT.cxx:239
#10 ROOT::TTreeProcessorMT::Process(std::function<void (TTreeReader&)>) (this=this
entry=0x55e7ef54b8a0, func=...) at /home/physics-tools/root_git/tree/treeplayer/src/TTreeProcessorMT.cxx:604
#11 0x00007f8c30bb0232 in ROOT::Detail::RDF::RLoopManager::RunTreeProcessorMT() (this=0x55e7ee4ebb60) at /home/physics-tools/root_git/tree/dataframe/src/RLoopManager.cxx:423
#12 0x00007f8c30bb1545 in ROOT::Detail::RDF::RLoopManager::Run() (this=0x55e7ee4ebb60) at /home/physics-tools/root_git/tree/dataframe/src/RLoopManager.cxx:697
#13 0x00007f8c2040901f in  ()
#14 0x0000000000000000 in  ()
===========================================================


Traceback (most recent call last):
File "/home.hdd/adrian/work/AO2Dproto/./ao2d_process.py", line 126, in <module>
h_pt.Draw()
cppyy.ll.SegmentationViolation: TH1D& ROOT::RDF::RResultPtr<TH1D>::operator*() =>
SegmentationViolation: segfault in C++; program state was reset

Hi @adrian_sev ,
sorry about that and thank you for reporting. Needless to say, we should not crash like that :confused: I added proper error handling in this PR.

The problem seems to be that one of the friend trees cannot be retrieved from one of the files – but I can’t tell whether the issue is something like a mistyped tree name/file name or rather something else in RDataFrame’s internals. We would need a reproducer to investigate further on our side.

Cheers,
Enrico

hmm, right!! trying with only one file in list seem so work. i still stumble on the assumption that if i can open an tfile then the content is valid. as i found in my other thread, for each TTree i can read first and last entry, THEN if all trees are valid then i can add a selection of trees into the chains that are to be added as friends :slight_smile:

In this case I think you should see that retrieving one of those TTrees from their TFile returns a nullptr – that would be the most obvious explanation for the crash you see (which my changes above, now merged in master, will transform in an exception with a more friendly error message).

Cheers,
Enrico

erm, so this in in python so i do not have any pointer :slight_smile: but i assume that if i can do GetEntry(i) without an exception (because i would expect that if tree is invalid GetTree should throw an exception similar with trying GetEntry on nullptr) then the TTree itself is ok.
Looking at your commit i was wondering if the same check(s) (for tree validity) can be added to TChain::AddFile to check for TFile and TTree validity (throwing exceptions in case they are not … these maybe enabled by flags). At this moment i check for AddFile result to be != 1 but it does not work, it adds the tree to the chain anyway.
Thanks a lot!

so, i found that i can make use of AddFile(“pattern”, -1); to connect the tree, and i do receive a warning like:
Warning in <TChain::AddFile>: Adding tree with no entries from file: <the file>
the problem is that the result of AddFile is still 1 even if the tree is connected and the warning issued.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.