PROOF Crashes while Writing Histogram


_ROOT Version:_6.19/01
_Platform:_linuxx8664gcc
_Compiler:_gcc version 8.3.1 20190223 (Red Hat 8.3.1-2) (GCC)


Dear ROOT Experts,

I am facing a problem while writing a histogram which is generated by processing root file using TSelector and PROOF. The details of the problem is given below.

I have several root files which contain a tree called “canSort”. The tree is fairly simple with only 5 branches. I am trying to sort the data into a 3-dimensional THnSparse. Since the total size of all root files is about 18GB, I decided to parallel process the data using TSelector and PROOF. To start with, I am using only one root file. I do the following:

root [0] TChain* fChain = new TChain("canSort");
root [1] fChain->AddFile("sixT2MeV_I.root");
root [2] fChain->Process("CShiftSort.C+");

This works perfectly and generates the desired root file without any problem. BUT when I try to process (on a workstation with 24 cores and 32 GB RAM) the same file using PROOF as follows:

root [0] TChain* fChain = new TChain("canSort");
root [1] fChain->AddFile("sixT2MeV_1.root");
root [2] TProof::Open("","workers=4");
 +++ Starting PROOF-Lite with 4 workers +++
Opening connections to workers: OK (4 workers)                 
Setting up worker servers: OK (4 workers)                 
PROOF set to parallel mode (4 workers)
root [3] fChain->SetProof();
root [4] fChain->Process("CShiftSort.C+");

it shows the following:

Info in <TProofLite::SetQueryRunning>: starting query: 1
Info in <TProofQueryResult::SetRunning>: nwrks: 4
Looking up for exact location of files: OK (1 files)                 
Looking up for exact location of files: OK (1 files)                 
Info in <TPacketizer::TPacketizer>: Initial number of workers: 4
Validating files: OK (1 files)                 
Lite-0: merging output objects ... / (1 workers still sending)   
 *** Break *** segmentation violation



===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0  0x00007f82bb08959b in waitpid () from /lib64/libc.so.6
#1  0x00007f82bb00583f in do_system () from /lib64/libc.so.6
#2  0x00007f82bb7e1a63 in TUnixSystem::Exec (shellcmd=<optimized out>, this=0x8307c0) at /opt/root-6.20/core/unix/src/TUnixSystem.cxx:2106
#3  TUnixSystem::StackTrace (this=0x8307c0) at /opt/root-6.20/core/unix/src/TUnixSystem.cxx:2396
#4  0x00007f82bb7e3b44 in TUnixSystem::DispatchSignals (this=0x8307c0, sig=kSigSegmentationViolation) at /opt/root-6.20/core/unix/src/TUnixSystem.cxx:3627
#5  <signal handler called>
#6  0x00007f82aece48bc in CShiftSort::Terminate() () from /home/ajay/11B208Pb_Jul2016/ROOT_Sort/ED_3DSparse_CS/CShiftSort_C.so
#7  0x00007f82b011597a in TProofPlayerLite::Finalize (this=0x20034d0, force=<optimized out>, sync=<optimized out>) at /opt/root-6.20/proof/proofplayer/src/TProofPlayerLite.cxx:407
#8  0x00007f82b01171f9 in TProofPlayerLite::Process (this=0x20034d0, dset=<optimized out>, selector_file=<optimized out>, option=<optimized out>, nentries=<optimized out>, first=0) at /opt/root-6.20/proof/proofplayer/src/TProofPlayerLite.cxx:312
#9  0x00007f82b08568f9 in TProofLite::Process (this=0x1ef4720, dset=0x230e6a0, selector=0x7f82bb4ed000 "CShiftSort.C+", option=<optimized out>, nentries=9223372036854775807, first=0) at /opt/rootV620/include/TString.h:295
#10 0x00007f82bb4ee0ae in ?? ()
#11 0x00000000023dce40 in ?? ()
#12 0x7be82e2cb6528600 in ?? ()
#13 0x0000000000000000 in ?? ()
===========================================================


The lines below might hint at the cause of the crash.
You may get help by asking at the ROOT forum http://root.cern.ch/forum
Only if you are really convinced it is a bug in ROOT then please submit a
report at http://root.cern.ch/bugs Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#6  0x00007f82aece48bc in CShiftSort::Terminate() () from /home/ajay/11B208Pb_Jul2016/ROOT_Sort/ED_3DSparse_CS/CShiftSort_C.so
#7  0x00007f82b011597a in TProofPlayerLite::Finalize (this=0x20034d0, force=<optimized out>, sync=<optimized out>) at /opt/root-6.20/proof/proofplayer/src/TProofPlayerLite.cxx:407
#8  0x00007f82b01171f9 in TProofPlayerLite::Process (this=0x20034d0, dset=<optimized out>, selector_file=<optimized out>, option=<optimized out>, nentries=<optimized out>, first=0) at /opt/root-6.20/proof/proofplayer/src/TProofPlayerLite.cxx:312
#9  0x00007f82b08568f9 in TProofLite::Process (this=0x1ef4720, dset=0x230e6a0, selector=0x7f82bb4ed000 "CShiftSort.C+", option=<optimized out>, nentries=9223372036854775807, first=0) at /opt/rootV620/include/TString.h:295
#10 0x00007f82bb4ee0ae in ?? ()
#11 0x00000000023dce40 in ?? ()
#12 0x7be82e2cb6528600 in ?? ()
#13 0x0000000000000000 in ?? ()
===========================================================

Root > 
root [5] 

I have tried with different number of workers as well, but the problem remains.

I have observed that if I comment sparse3D->Write(); then the above error disappears! BUT then I don’t have what I want :frowning: . I thought, may be there is some problem filling THnSparse, hence I tried to generate a 2D histogram. But that also failed!

I am attaching the files which I have been using to achieve this.

Can you please help me solve the problem? Any help is highly appreciated.

Thanking you.

With best regards,

Ajay

CShiftSort.C (6.1 KB)
CShiftSort.h (2.7 KB)
sixT2MeV_I.root.gz

I think @ganis should know.

Dear Ajay,

Only some basic objects are retrieved automatically from the output list: not sure sparse histos are among them. So I would suggest to try adding
sparse3D = (THSparseF *) fOutput->Get("sparse3D");
in Terminate before using it.

This said, PROOF is in legacy mode. Since you are using PROOF-Lite you have at least a couple of possibilities of replacement to exploit the local cores of you machine. Have a look under tutorials/multicore for examples, in particular the Executor family or even RDataFrame .

G Ganis

After adding what you have suggested, I get the following error:

././CShiftSort.C:177:36: error: no member named 'Get' in 'TSelectorList'
        sparse3D = (THnSparseF *)fOutput->Get("sparse3D");
                                 ~~~~~~~  ^
Error in <ACLiC>: Dictionary generation failed!

Of course the name of the method to use is FindObject not Get:
sparse3D = (THSparseF *) fOutput->FindObject("sparse3D");

GG

1 Like

Thank you!

This modification to the code is working i.e. I NO MORE get *** Break *** segmentation violation.

However, another issue which I am facing is: Lite-0: merging output objects... taking ages to finish.

Do you see a much increased memory usage?

Merging sparse histograms often fills a large fraction of bins - making them non-sparse.

I will have to check.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.