And after the slices are registered you should see the following error:
Error in <TBranchElement::SetAddress>: STL container with fStreamerType: 500
Warning in <TTree::CopyEntries>: The export branch and the import branch do not have the same streamer type. (The branch name is m_vector.)
Nonetheless all events should process and the output files will be saved in output/
Thanks!
EDIT: Just tested with the newest release 6.20.02, same issue.
Hi,
sorry, I had to look at other issues on Thursday and Friday
I’ll try to jump back on this as soon as possible. In the meanwhile, what does /usr/bin/time say about the program’s memory usage (compared to max amount of RAM available on the lxplus machines)? Do you also get these errors if you run on a different machine than lxplus, e.g. on a personal computer?
Hi, 8426276k maxresident means the program is using ~8GB of RAM – might be too much for lxplus. That’s a different problem than the error messages,
Error in <TBranchElement::SetAddress>: STL container with fStreamerType: 500
Warning in <TTree::CopyEntries>: The export branch and the import branch do not have the same streamer type. (The branch name is m_vector.)
I tried your reproducer, I also get these error messages, will have to investigate. This is independent from the large memory usage which might get your job killed on lxplus.
I don’t know if I’ll manage to work on this today. If not, definitely tomorrow.
okay, thanks! I just ran locally on MacOS Catalina and I get the same problems. With only a few slices (standard reproducer) I get exactly the same output but again, the program runs through fine. If I run over all slices (eta=5.0), then after some time I get additionally
root.exe(34406,0x70000ddc3000) malloc: *** error for object 0x7fc650d91530: pointer being freed was not allocated
root.exe(34406,0x70000ddc3000) malloc: *** set a breakpoint in malloc_error_break to debug
and the programm stops its execution. I attached the sample files for the root process executed with some slices and with all slices. Hope that helps!
I ran on both files and, I think, on all slices (for (float eta = 0; eta < 5.0; eta += 0.05)).
It took 14 minutes with a single thread, it required 7.5GB of RAM (lots of TTrees opened at the same time, I guess), and it did not print any error message.
Did the machine where you got the Bus error have at least 8 GB or RAM? (easy fix if increasing the number of slices causes memory problems: do just 10-15 slices at a time…while I try to figure out what’s wrong).
any news on this? Without MT I am able to use it but it is rather cumbersome as I have very large files which take long to process and in addition for some slices I get
SysError in <TFile::Flush>: error flushing file
Let me know if you found a solution
Edit: And it also get kills if I run over too many data
These Bus error, error flushing file and “get kills if too many data”, I cannot reproduce on my machine, so I would attribute them to hitting some quota limits on lxplus or the machine you run on (or I need help to reproduce them). A workaround might be run on a few slices at a time rather than all slices at the same time, see my last post.
I have been looking into the errors:
Error in <TBranchElement::SetAddress>: STL container with fStreamerType: 500
Warning in <TTree::CopyEntries>: The export branch and the import branch do not have the same streamer type. (The branch name is m_vector.)
which I can reproduce (and, occasionally, also result in a segfault). The problem is with the branches of type FCS_matchedcellvector, which multi-thread Snapshot does not deal with correctly. I don’t have a workaround other than turning off IMT for now, but we are actively investigating.
Also: is this Snapshot-based solution any better than your original code, in the end?
just to let you know that even with one thread the slicing crashes some times (mostly shortly before the slicing is complete) and sometimes I even get a stack trace:
===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0 0x00007f46090d241c in waitpid () from /lib64/libc.so.6
#1 0x00007f460904ff12 in do_system () from /lib64/libc.so.6
#2 0x00007f4609cc0533 in TUnixSystem::StackTrace() () from /cvmfs/sft.cern.ch/lcg/releases/LCG_96b/ROOT/6.18.04/x86_64-centos7-gcc8-opt/lib/libCore.so
#3 0x00007f4609cc2d84 in TUnixSystem::DispatchSignals(ESignals) () from /cvmfs/sft.cern.ch/lcg/releases/LCG_96b/ROOT/6.18.04/x86_64-centos7-gcc8-opt/lib/libCore.so
#4 <signal handler called>
#5 ~TTreeReaderArrayBase (this=0x10478300, __in_chrg=<optimized out>) at /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.18.04-c767d/x86_64-centos7-gcc8-opt/include/TTreeReaderArray.h:75
#6 ~TTreeReaderArray (this=0x10478300, __in_chrg=<optimized out>) at /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.18.04-c767d/x86_64-centos7-gcc8-opt/include/TTreeReaderArray.h:75
#7 TTreeReaderArray<float>::~TTreeReaderArray (this=0x10478300, __in_chrg=<optimized out>) at /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.18.04-c767d/x86_64-centos7-gcc8-opt/include/TTreeReaderArray.h:75
#8 0x00007f460a582bbb in ?? ()
#9 0x00007ffeca0b7618 in ?? ()
#10 0x0000000010478300 in ?? ()
#11 0x0000000010478300 in ?? ()
#12 0x0000000010ae4c08 in ?? ()
#13 0x00007ffeca0b7680 in ?? ()
#14 0x00007f460a58b143 in ?? ()
#15 0x000000001af03e50 in ?? ()
#16 0x00007f460a582b90 in ?? ()
#17 0x00007ffeca0b7670 in ?? ()
#18 0x0000000010ae4c08 in ?? ()
#19 0x0000000010ae4c08 in ?? ()
#20 0x00007f460a59e2e0 in ?? ()
#21 0x0000000010478300 in ?? ()
#22 0x0000000010ae4c08 in ?? ()
#23 0x00007ffeca0b76c0 in ?? ()
#24 0x00007f460a534f71 in ?? ()
#25 0x00007ffeca0b76d0 in ?? ()
#26 0x0000000000000000 in ?? ()
===========================================================
The lines below might hint at the cause of the crash.
You may get help by asking at the ROOT forum http://root.cern.ch/forum
Only if you are really convinced it is a bug in ROOT then please submit a
report at http://root.cern.ch/bugs Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#5 ~TTreeReaderArrayBase (this=0x10478300, __in_chrg=<optimized out>) at /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.18.04-c767d/x86_64-centos7-gcc8-opt/include/TTreeReaderArray.h:75
#6 ~TTreeReaderArray (this=0x10478300, __in_chrg=<optimized out>) at /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.18.04-c767d/x86_64-centos7-gcc8-opt/include/TTreeReaderArray.h:75
#7 TTreeReaderArray<float>::~TTreeReaderArray (this=0x10478300, __in_chrg=<optimized out>) at /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.18.04-c767d/x86_64-centos7-gcc8-opt/include/TTreeReaderArray.h:75
#8 0x00007f460a582bbb in ?? ()
#9 0x00007ffeca0b7618 in ?? ()
#10 0x0000000010478300 in ?? ()
#11 0x0000000010478300 in ?? ()
#12 0x0000000010ae4c08 in ?? ()
#13 0x00007ffeca0b7680 in ?? ()
#14 0x00007f460a58b143 in ?? ()
#15 0x000000001af03e50 in ?? ()
#16 0x00007f460a582b90 in ?? ()
#17 0x00007ffeca0b7670 in ?? ()
#18 0x0000000010ae4c08 in ?? ()
#19 0x0000000010ae4c08 in ?? ()
#20 0x00007f460a59e2e0 in ?? ()
#21 0x0000000010478300 in ?? ()
#22 0x0000000010ae4c08 in ?? ()
#23 0x00007ffeca0b76c0 in ?? ()
#24 0x00007f460a534f71 in ?? ()
#25 0x00007ffeca0b76d0 in ?? ()
#26 0x0000000000000000 in ?? ()
===========================================================
Bus error (core dumped)
I guess I cannot produce a reproducer in this case as it sometimes happens and sometimes it does not, but I thought this might be useful for you.
A reproducer that crashes just some of the times is still a reproducer! Or better a full recipe…it does not happen on my workstation as far as I can tell.
Hi @mark1,
this is just to let you know that thanks to @pcanal the issue with the fStreamerType errors when running a Snapshot on multiple threads has been resolved. The fix is in master and it will be part of the upcoming ROOT release 6.22.
Feel free to open a fresh thread in case you encounter are further issues.