Loop mysteriously crashing

Dear ROOTers,

I have a strange feature when running a simple toy program
iktp.tu-dresden.de/~prudent/Dive … rStatUnc.C
You can run directly just by copying it and run it with

root -l -b
toyForStatUnc( “B0omegaD0star_D0starToD0pi0_D0ToKPi” )

It includes a loop over toys,
when running over 100 toys for instance, it run smoothly until the toy #22 and then crashes with a segmentation fault, apparently because of one of these lines:

data_fL0 = pdf_fL0->generate( RooArgList(_cosThetaDstar, _cosThetaOmega), rfL0, 0 );
data_fL1 = pdf_fL1.generate( RooArgList(_cosThetaDstar, _cosThetaOmega), rfL1, 0 );

I am running out of idea on the reason why :-S

Any idea is welcome !

Regards,
Xavier

HI,

Can you try when compiling your script via ACLiC? i.e. .L toyForStatUnc.C+

Philippe

Hello,

I added some includes to my macro
iktp.tu-dresden.de/~prudent/Dive … rStatUnc.C

the output is
iktp.tu-dresden.de/~prudent/Dive … hACLIC/out

it crashes now at step 32 still without any obvious reason :-S

Xavier

PS: If you compare with the previous version
iktp.tu-dresden.de/~prudent/Dive … rStatUnc.C
I had to comment out the PlotOn arguments as they made the compilation failling with messages like
`XErrorSize’ was not declared in this scope
even if I used:
use namespace RooFit

Here is the errors messages dumped,
it looks like a Delete() is attempted on the RooNDKeysPdf within the fitTo…
Is that normal ?

*** Break *** bus error
(no debugging symbols found)
Using host libthread_db library “/lib64/tls/libthread_db.so.1”.
Attaching to program: /proc/9602/exe, process 9602
(no debugging symbols found)…done.
[Thread debugging using libthread_db enabled]
[New Thread 182913307584 (LWP 9602)]
(no debugging symbols found)…done.
(no debugging symbols found)…done.
0x000000302508f504 in waitpid () from /lib64/tls/libc.so.6
#1 0x0000003025039a1f in do_system () from /lib64/tls/libc.so.6
#2 0x0000002a9579df6d in TUnixSystem::StackTrace () from /usr/local/root/pro/root/lib/libCore.so
#3 0x0000002a9579acea in TUnixSystem::DispatchSignals () from /usr/local/root/pro/root/lib/libCore.so
#4
#5 0x0000002a983aab50 in std::_Rb_tree<int, std::pair<int const, bool>, std::_Select1st<std::pair<int const, bool> >, std::less, std::allocator<std::pair<int const, bool> > >::_M_erase ()
from /usr/local/root/pro/root/lib/libRooFit.so
#6 0x0000002a983aab5c in std::_Rb_tree<int, std::pair<int const, bool>, std::_Select1st<std::pair<int const, bool> >, std::less, std::allocator<std::pair<int const, bool> > >::_M_erase ()
from /usr/local/root/pro/root/lib/libRooFit.so
#7 0x0000002a983aab5c in std::_Rb_tree<int, std::pair<int const, bool>, std::_Select1st<std::pair<int const, bool> >, std::less, std::allocator<std::pair<int const, bool> > >::_M_erase ()
from /usr/local/root/pro/root/lib/libRooFit.so
#8 0x0000002a983aab5c in std::_Rb_tree<int, std::pair<int const, bool>, std::_Select1st<std::pair<int const, bool> >, std::less, std::allocator<std::pair<int const, bool> > >::_M_erase ()
from /usr/local/root/pro/root/lib/libRooFit.so
#9 0x0000002a983aab5c in std::_Rb_tree<int, std::pair<int const, bool>, std::_Select1st<std::pair<int const, bool> >, std::less, std::allocator<std::pair<int const, bool> > >::_M_erase ()
from /usr/local/root/pro/root/lib/libRooFit.so
#10 0x0000002a983aab5c in std::_Rb_tree<int, std::pair<int const, bool>, std::_Select1st<std::pair<int const, bool> >, std::less, std::allocator<std::pair<int const, bool> > >::_M_erase ()
from /usr/local/root/pro/root/lib/libRooFit.so
#11 0x0000002a983aab5c in std::_Rb_tree<int, std::pair<int const, bool>, std::_Select1st<std::pair<int const, bool> >, std::less, std::allocator<std::pair<int const, bool> > >::_M_erase ()
from /usr/local/root/pro/root/lib/libRooFit.so
#12 0x0000002a983aab5c in std::_Rb_tree<int, std::pair<int const, bool>, std::_Select1st<std::pair<int const, bool> >, std::less, std::allocator<std::pair<int const, bool> > >::_M_erase ()
from /usr/local/root/pro/root/lib/libRooFit.so
#13 0x0000002a983aab5c in std::_Rb_tree<int, std::pair<int const, bool>, std::_Select1st<std::pair<int const, bool> >, std::less, std::allocator<std::pair<int const, bool> > >::_M_erase ()
from /usr/local/root/pro/root/lib/libRooFit.so
#14 0x0000002a983aab5c in std::_Rb_tree<int, std::pair<int const, bool>, std::_Select1st<std::pair<int const, bool> >, std::less, std::allocator<std::pair<int const, bool> > >::_M_erase ()
from /usr/local/root/pro/root/lib/libRooFit.so
#15 0x0000002a983aab5c in std::_Rb_tree<int, std::pair<int const, bool>, std::_Select1st<std::pair<int const, bool> >, std::less, std::allocator<std::pair<int const, bool> > >::_M_erase ()
from /usr/local/root/pro/root/lib/libRooFit.so
#16 0x0000002a9839bffc in RooNDKeysPdf::~RooNDKeysPdf$delete () from /usr/local/root/pro/root/lib/libRooFit.so
#17 0x0000002a97db2199 in RooAbsCollection::safeDeleteList () from /usr/local/root/pro/root/lib/libRooFitCore.so
#18 0x0000002a97db23dc in RooAbsCollection::~RooAbsCollection$base () from /usr/local/root/pro/root/lib/libRooFitCore.so
#19 0x0000002a97de8d31 in RooArgSet::~RooArgSet$delete () from /usr/local/root/pro/root/lib/libRooFitCore.so
#20 0x0000002a97dbc133 in RooAbsOptTestStatistic::~RooAbsOptTestStatistic$base () from /usr/local/root/pro/root/lib/libRooFitCore.so
#21 0x0000002a97e55a21 in RooNLLVar::~RooNLLVar () from /usr/local/root/pro/root/lib/libRooFitCore.so
#22 0x0000002a97dbe613 in RooAbsPdf::fitTo () from /usr/local/root/pro/root/lib/libRooFitCore.so
#23 0x0000002a97dbe27a in RooAbsPdf::fitTo () from /usr/local/root/pro/root/lib/libRooFitCore.so
#24 0x0000002a98c3d527 in toyForStatUnc () from /afs/in2p3.fr/home/p/prudent/public/releases/ana31/workdir/analysis_D0h0_NonCP/polarisation_D0starOmega/fitLongFraction/./toyForStatUnc_C.so
#25 0x0000002a98c3da6e in G__filejkI18B__0_2902 () from /afs/in2p3.fr/home/p/prudent/public/releases/ana31/workdir/analysis_D0h0_NonCP/polarisation_D0starOmega/fitLongFraction/./toyForStatUnc_C.so
#26 0x0000002a95f2d315 in G__call_cppfunc () from /usr/local/root/pro/root/lib/libCint.so
#27 0x0000002a95f142be in G__interpret_func () from /usr/local/root/pro/root/lib/libCint.so
#28 0x0000002a95f02dc7 in G__getfunction () from /usr/local/root/pro/root/lib/libCint.so
#29 0x0000002a95ed90c7 in G__getitem () from /usr/local/root/pro/root/lib/libCint.so
#30 0x0000002a95edd46a in G__getexpr () from /usr/local/root/pro/root/lib/libCint.so
#31 0x0000002a95f59b95 in G__exec_statement () from /usr/local/root/pro/root/lib/libCint.so
#32 0x0000002a95ec68e4 in G__exec_tempfile_core () from /usr/local/root/pro/root/lib/libCint.so
#33 0x0000002a95ec7d3e in G__exec_tempfile_fp () from /usr/local/root/pro/root/lib/libCint.so
#34 0x0000002a95f665b1 in G__process_cmd () from /usr/local/root/pro/root/lib/libCint.so
#35 0x0000002a9576c97f in TCint::ProcessLine () from /usr/local/root/pro/root/lib/libCore.so
#36 0x0000002a956d9b85 in TApplication::ProcessLine () from /usr/local/root/pro/root/lib/libCore.so
#37 0x0000002a96683973 in TRint::HandleTermInput () from /usr/local/root/pro/root/lib/libRint.so
#38 0x0000002a96682127 in TTermInputHandler::Notify () from /usr/local/root/pro/root/lib/libRint.so
#39 0x0000002a9668415d in TTermInputHandler::ReadNotify () from /usr/local/root/pro/root/lib/libRint.so
#40 0x0000002a957971c3 in TUnixSystem::CheckDescriptors () from /usr/local/root/pro/root/lib/libCore.so
#41 0x0000002a9579b329 in TUnixSystem::DispatchOneEvent () from /usr/local/root/pro/root/lib/libCore.so
#42 0x0000002a957307b5 in TSystem::InnerLoop () from /usr/local/root/pro/root/lib/libCore.so
#43 0x0000002a9573058e in TSystem::Run () from /usr/local/root/pro/root/lib/libCore.so
#44 0x0000002a956d9c5f in TApplication::Run () from /usr/local/root/pro/root/lib/libCore.so
#45 0x0000002a9668287e in TRint::Run () from /usr/local/root/pro/root/lib/libRint.so
#46 0x000000000040106d in main ()
Root >

Hi Xavier,

So it looks like you crash in the RooNDKeysPdf destructor. I have recently applied some fixes in this class that relate to memory management. Are you running a recent ROOT version? If not can you try 5.26 to see if that fixes this problem?

Wouter

Hello Wouter,

In fact I just found out how to fix the problem (but not why…):

In my macro, I built my pdf in the following way (it is a 2D fit):

  • build the two 1D pdfs for signal and background
  • build the product pdf1 x pdf2 for signal and background
  • add the two products pdf: SB = signal + background

One of the 1D pdf is a pointer to a RooNDKeys, the pdf SB is also a pointer.
If I take SB as a simple declaration of RooAddPdf, it is fine and my toys run smothly until the very end.

BTW I had also noticed that when I quit ROOT I systematically got segmentation violation associated to the SB pdf. This problem also vanished by not taking SB as a pointer.

Do you think it may come from the way RooNDKeys object is deleted ?

Cheers,
Xavier

Sorry for the double message…

I used 5.22 (highest version I could find at ccali),

Xavier

Hi Xavier,

It looks indeed likely that your problem is related to RooNDKeysPdf.
Since 5.22 I have applied a couple of memory-management related fixes
to this class. So if you would e.g. switch to ROOT 5.26 the problem
will probably go away.

Wouter

Hi,

looks I also have similar error output

My code is running without crash in my computer, but when I am trying to run it in
lab’s machines it again crashes at some point of the loop.

If you interested, the code is quoted

#include <TF1.h>
#include <TF2.h>
#include <TH1D.h>
#include <TH2D.h>
#include <cmath>
#include <ctime>
#include <TFile.h>
#include <sys/time.h>

#include <kin_funcs.h>
#include <cross_sections.h>

using namespace std;

int main()
{
  const int n_Q2 = 100;
  const int n_s = 100;
  const int n_t = 5;
  
  const double deg2rad = 1.74532925199432955e-02;

  double ma = 0.;        //incoming photon mass (GeV)
  double mb = 0.938;    //target masss (GeV)
  double m1 = sqrt(4.6);        //Timelike photon mass (GeV)
  double m2 = 0.938;    //recoil proton mass (GeV)

  const int n_tbins = 4;
  const int n_phi_bins = 12;
  const int n_th_bins = 8;
  double t_edges_[n_tbins + 1] = {-0.1, -0.2, -0.35, -0.55, -0.8}; // This is a random choise of t bins
  double phi_edges_[n_phi_bins + 1] = {0., 30., 60., 90., 120., 150., 180., 210., 240., 270., 300., 330., 360.};
  double th_edges_[n_th_bins + 1] = {0., 22.5, 45., 67.5, 90., 112.5, 135., 157.5, 180.};

  const double Eg_min = 2;
  const double Eg_max = 5.76;

  const double s_min = mb*(mb + 2*Eg_min);
  const double s_max = mb*(mb + 2*Eg_max);
  const double ds = (s_max - s_min)/double(n_s);

  const double Q2_min = 1.05;
  
  TFile *file_out = new TFile("BH_estimate.root", "Recreate");

//  TF2 *f_BH_crs = new TF2("f_BH_crs", BH_cros_section, 0, 360, 0, 180, 4);
  TF2 f_BH_crs("f_BH_crs", BH_cros_section, 0, 360, 0, 180, 4);

  f_BH_crs.SetParameters(-0.29, 1.25, 5.45, -1);
  f_BH_crs.Draw("colz");
  TH2D *h_BH_crs = (TH2D*)f_BH_crs.GetHistogram()->Clone("h_BH_crs");

  TH1D *h_delt_t = new TH1D("h_delt_t", "", 1000, 0, 1000);

  timespec ts;
  int counter = 0;
  double s = s_min;
  for( int is = 0; is < n_s; is++ )
    {
      double Q2_max = (sqrt(s) - mb)*(sqrt(s) - mb);
      double dQ2 = (Q2_max - Q2_min)/double(n_Q2);
      
      double Q2 = Q2_min;
      for( int iQ2 = 0; iQ2 < n_Q2; iQ2++ )
        {
          for(int t_bin = 0; t_bin < n_tbins; t_bin++)
            {         
              double t_min = min(T_min( ma, mb, sqrt(Q2), m2, s), t_edges_[t_bin]);
              double t_max = max(T_max( ma, mb, sqrt(Q2), m2, s), t_edges_[t_bin + 1]);
              
              double dt = (t_max - t_min)/double(n_t);
              double t = t_min;
              
              for( int it = 0; it < n_t; it++ )
                {
                  double Eg = (s - mb*mb)/(2*mb);
                  f_BH_crs.SetParameters(t, Q2, Eg, -2);
                  
                 // cout<<"s, Q2, t_min, t_max, t => "<<s<<"\t"<<Q2<<"\t"<<t_min<<"\t"<<t_max<<"\t"<<t<<endl;

                  for( int phi_bin = 10; phi_bin < 11; phi_bin++ )
                    {
                      for( int th_bin = 5; th_bin < 6; th_bin++ )
                        {
                          //clock_t t_before = clock();
                          clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts);
                          Double_t time_before = Double_t(ts.tv_nsec);
                          
                          //cout<<"timespec time before = "<<ts.tv_nsec/1e9<<endl;

                          f_BH_crs.Integral(phi_edges_[phi_bin], phi_edges_[phi_bin + 1], 
                                             th_edges_[th_bin], th_edges_[th_bin + 1]);
                          clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts);
                          Double_t time_after = Double_t(ts.tv_nsec);

                          Double_t delt_t = time_after - time_before;
                          
                          if( delt_t > 1e5 )
                            {
                              cout<<"counter = "<<counter<<"  delt_t = "<<delt_t<<endl;
                            }

                          counter = counter + 1;

                          //cout<<"elapsed time = "<<delt_t<<endl;
                          
                          h_delt_t->Fill(delt_t/1000.);

                        }
                    }
                  t = t + dt;
                }
            
              
            }
          
          Q2 = Q2 + dQ2;
        }

      s = s + ds;
    }
  cout<<"counter ="<<counter<<endl;

  h_delt_t->Write();
  h_BH_crs->Write();
  file_out->Close();
}

without TF2.Integral()

  f_BH_crs.Integral(phi_edges_[phi_bin], phi_edges_[phi_bin + 1], 
                                             th_edges_[th_bin], th_edges_[th_bin + 1]);

It runs normally without crashing.
In the lab’s computers I have used different root versions 5.34, 5.32, 5.26, but again the same
error.

In my computer I have version 5.34.
The only difference that I think may be relevant is that I am using 32 bit Ubuntu, while in the lab system is
64 bit CentOS.

I appreciate any clue on that
Rafayel

Hi,

Can youbplease post your file so your problem can be reproduced?

Lorenzo

Thank you Lorentzo,

in the attached kin_funcs.cc and BH_crs_section.cc there are functions defined, which I have used in the,
main code.
Here are command that I have used for making libraries.

=====commands to make libraries=====
g++ -c -fPIC BH_crs_section.cc `root-config --libs` `root-config --cflags` -I/usr/local/include/work -o BH_crs_section.o
g++ -shared -Wl,-soname,libBH_crs.so -o libBH_crs.so.1.0.1 BH_crs_section.o
g++ -c -fPIC kin_funcs.cc `root-config --libs` `root-config --cflags` -I/usr/local/include/work -o kin_funcs.o
g++ -shared -Wl,-soname,libkin_funcs.so -o libkin_funcs.so.1.0.1 kin_funcs.o

Later I have compiled the main code with the following command

g++ BH_estimate.cc -o  BH_estimate.exe `root-config --cflags` `root-config --libs` -I/usr/local/include/work -L/usr/local/lib/work -lBH_crs -lkin_funcs -lrt

The main code is quoted in the previous message.

Rafo
cross_sections.h (248 Bytes)
kin_funcs.h (249 Bytes)
kin_funcs.cc (682 Bytes)
BH_crs_section.cc (2.46 KB)

I found a problem.

It was my fault, sorry for that.

In the code t_min and t_max were not properly defined and, for
these values f_BH_crs->Eval(phi, th) was “nan”.

The fact that it didn’t crash in my computer confused me.

Rafo