Most efficient way to slice TTree in one variable

Hi @eguiraud,

here is the most minimal reproducer I could come up with:

https://cernbox.cern.ch/index.php/s/l3toLo4cbH59nxI

You can run it with:

  lsetup "root 6.18.04-x86_64-centos7-gcc8-opt"
  root -l
  .L FCS_Cell.h+
  .x sliceTreeInEta.C 

And after the slices are registered you should see the following error:

Error in <TBranchElement::SetAddress>: STL container with fStreamerType: 500
Warning in <TTree::CopyEntries>: The export branch and the import branch do not have the same streamer type. (The branch name is m_vector.)

Nonetheless all events should process and the output files will be saved in output/

Thanks!

EDIT: Just tested with the newest release 6.20.02, same issue.

Hi @eguiraud,

any news on this? :slight_smile:

Hi,
sorry, I had to look at other issues on Thursday and Friday :smile:

I’ll try to jump back on this as soon as possible. In the meanwhile, what does /usr/bin/time say about the program’s memory usage (compared to max amount of RAM available on the lxplus machines)? Do you also get these errors if you run on a different machine than lxplus, e.g. on a personal computer?

Cheers,
Enrico

Hi @eguiraud,

the output of /usr/bin/time for the reproducer is (not sure how to interpret this)

  795.99 user 156.90 system 10:43.40 elapsed 148% CPU (0 avgtext+0 avgdata 8426276 maxresident)k

  800704 inputs+8 outputs (3336 major+2880290 minor) pagefaults 0 swaps

I will try running it on my local machine tomorrow

Hi,
8426276k maxresident means the program is using ~8GB of RAM – might be too much for lxplus. That’s a different problem than the error messages,

Error in <TBranchElement::SetAddress>: STL container with fStreamerType: 500
Warning in <TTree::CopyEntries>: The export branch and the import branch do not have the same streamer type. (The branch name is m_vector.)

I tried your reproducer, I also get these error messages, will have to investigate. This is independent from the large memory usage which might get your job killed on lxplus.

I don’t know if I’ll manage to work on this today. If not, definitely tomorrow.

Cheers,
Enrico

Hi @eguiraud,

okay, thanks! I just ran locally on MacOS Catalina and I get the same problems. With only a few slices (standard reproducer) I get exactly the same output but again, the program runs through fine. If I run over all slices (eta=5.0), then after some time I get additionally

root.exe(34406,0x70000ddc3000) malloc: *** error for object 0x7fc650d91530: pointer being freed was not allocated
root.exe(34406,0x70000ddc3000) malloc: *** set a breakpoint in malloc_error_break to debug 

and the programm stops its execution. I attached the sample files for the root process executed with some slices and with all slices. Hope that helps!

Some slices: https://cernbox.cern.ch/index.php/s/S0ewFXmpVH4tVCY

All slices: https://cernbox.cern.ch/index.php/s/ZOMm0bFKyl3LAIY

Hi,
what’s the difference between these new files and the original you shared?

I can reproduce the issue, but I don’t have a fix yet. Work in progress.

Cheers,
Enrico

Hi @eguiraud,

these files are just debug outputs from the mac root.exe process. I thought they might be useful to you.

In my tests, the single-thread version works correctly, can you confirm (i.e. can you just comment the call to ROOT::EnableImplicitMT())?

The fact that I need multi-threading (and at least 10 slices) to reproduce the problem complicates debugging a bit. Work in progress…

1 Like

Hi @eguiraud,
I tried this already in the past and I got

Maybe you are not seeing this because you are not running over all slices/files?

I ran on both files and, I think, on all slices (for (float eta = 0; eta < 5.0; eta += 0.05)).
It took 14 minutes with a single thread, it required 7.5GB of RAM (lots of TTrees opened at the same time, I guess), and it did not print any error message.

Did the machine where you got the Bus error have at least 8 GB or RAM? (easy fix if increasing the number of slices causes memory problems: do just 10-15 slices at a time…while I try to figure out what’s wrong).

1 Like

Hi @eguiraud,

any news on this? Without MT I am able to use it but it is rather cumbersome as I have very large files which take long to process and in addition for some slices I get

   SysError in <TFile::Flush>: error flushing file 

Let me know if you found a solution :slight_smile:

Edit: And it also get kills if I run over too many data

These Bus error, error flushing file and “get kills if too many data”, I cannot reproduce on my machine, so I would attribute them to hitting some quota limits on lxplus or the machine you run on (or I need help to reproduce them). A workaround might be run on a few slices at a time rather than all slices at the same time, see my last post.

I have been looking into the errors:

Error in <TBranchElement::SetAddress>: STL container with fStreamerType: 500
Warning in <TTree::CopyEntries>: The export branch and the import branch do not have the same streamer type. (The branch name is m_vector.)

which I can reproduce (and, occasionally, also result in a segfault). The problem is with the branches of type FCS_matchedcellvector, which multi-thread Snapshot does not deal with correctly. I don’t have a workaround other than turning off IMT for now, but we are actively investigating.

Also: is this Snapshot-based solution any better than your original code, in the end?

1 Like

Hi @eguiraud,

okay, I guess I will have to deal with these restrictions for now.

Yes, this is definitely much better as the original copyTree method is much slower.

Cheers

The streamer type problem is now https://sft.its.cern.ch/jira/browse/ROOT-10648

Cheers,
Enrico

1 Like

Hi @eguiraud,

just to let you know that even with one thread the slicing crashes some times (mostly shortly before the slicing is complete) and sometimes I even get a stack trace:

===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0  0x00007f46090d241c in waitpid () from /lib64/libc.so.6
#1  0x00007f460904ff12 in do_system () from /lib64/libc.so.6
#2  0x00007f4609cc0533 in TUnixSystem::StackTrace() () from /cvmfs/sft.cern.ch/lcg/releases/LCG_96b/ROOT/6.18.04/x86_64-centos7-gcc8-opt/lib/libCore.so
#3  0x00007f4609cc2d84 in TUnixSystem::DispatchSignals(ESignals) () from /cvmfs/sft.cern.ch/lcg/releases/LCG_96b/ROOT/6.18.04/x86_64-centos7-gcc8-opt/lib/libCore.so
#4  <signal handler called>
#5  ~TTreeReaderArrayBase (this=0x10478300, __in_chrg=<optimized out>) at /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.18.04-c767d/x86_64-centos7-gcc8-opt/include/TTreeReaderArray.h:75
#6  ~TTreeReaderArray (this=0x10478300, __in_chrg=<optimized out>) at /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.18.04-c767d/x86_64-centos7-gcc8-opt/include/TTreeReaderArray.h:75
#7  TTreeReaderArray<float>::~TTreeReaderArray (this=0x10478300, __in_chrg=<optimized out>) at /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.18.04-c767d/x86_64-centos7-gcc8-opt/include/TTreeReaderArray.h:75
#8  0x00007f460a582bbb in ?? ()
#9  0x00007ffeca0b7618 in ?? ()
#10 0x0000000010478300 in ?? ()
#11 0x0000000010478300 in ?? ()
#12 0x0000000010ae4c08 in ?? ()
#13 0x00007ffeca0b7680 in ?? ()
#14 0x00007f460a58b143 in ?? ()
#15 0x000000001af03e50 in ?? ()
#16 0x00007f460a582b90 in ?? ()
#17 0x00007ffeca0b7670 in ?? ()
#18 0x0000000010ae4c08 in ?? ()
#19 0x0000000010ae4c08 in ?? ()
#20 0x00007f460a59e2e0 in ?? ()
#21 0x0000000010478300 in ?? ()
#22 0x0000000010ae4c08 in ?? ()
#23 0x00007ffeca0b76c0 in ?? ()
#24 0x00007f460a534f71 in ?? ()
#25 0x00007ffeca0b76d0 in ?? ()
#26 0x0000000000000000 in ?? ()
===========================================================


The lines below might hint at the cause of the crash.
You may get help by asking at the ROOT forum http://root.cern.ch/forum
Only if you are really convinced it is a bug in ROOT then please submit a
report at http://root.cern.ch/bugs Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#5  ~TTreeReaderArrayBase (this=0x10478300, __in_chrg=<optimized out>) at /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.18.04-c767d/x86_64-centos7-gcc8-opt/include/TTreeReaderArray.h:75
#6  ~TTreeReaderArray (this=0x10478300, __in_chrg=<optimized out>) at /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.18.04-c767d/x86_64-centos7-gcc8-opt/include/TTreeReaderArray.h:75
#7  TTreeReaderArray<float>::~TTreeReaderArray (this=0x10478300, __in_chrg=<optimized out>) at /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.18.04-c767d/x86_64-centos7-gcc8-opt/include/TTreeReaderArray.h:75
#8  0x00007f460a582bbb in ?? ()
#9  0x00007ffeca0b7618 in ?? ()
#10 0x0000000010478300 in ?? ()
#11 0x0000000010478300 in ?? ()
#12 0x0000000010ae4c08 in ?? ()
#13 0x00007ffeca0b7680 in ?? ()
#14 0x00007f460a58b143 in ?? ()
#15 0x000000001af03e50 in ?? ()
#16 0x00007f460a582b90 in ?? ()
#17 0x00007ffeca0b7670 in ?? ()
#18 0x0000000010ae4c08 in ?? ()
#19 0x0000000010ae4c08 in ?? ()
#20 0x00007f460a59e2e0 in ?? ()
#21 0x0000000010478300 in ?? ()
#22 0x0000000010ae4c08 in ?? ()
#23 0x00007ffeca0b76c0 in ?? ()
#24 0x00007f460a534f71 in ?? ()
#25 0x00007ffeca0b76d0 in ?? ()
#26 0x0000000000000000 in ?? ()
===========================================================


Bus error (core dumped)

I guess I cannot produce a reproducer in this case as it sometimes happens and sometimes it does not, but I thought this might be useful for you.

A reproducer that crashes just some of the times is still a reproducer! Or better a full recipe…it does not happen on my workstation as far as I can tell.

Cheers,
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Hi @mark1,
this is just to let you know that thanks to @pcanal the issue with the fStreamerType errors when running a Snapshot on multiple threads has been resolved. The fix is in master and it will be part of the upcoming ROOT release 6.22.

Feel free to open a fresh thread in case you encounter are further issues.

Cheers,
Enrico

1 Like