Bug in TGraph / RecursiveRemove

Hi,

I found a strange behaviour (with segfault) on quitting ROOT. The segfault appears when quitting after adding an additional TF1 to the ListOfFunctions of a TGraph object. Example code is:

void graph_bug(int mode=0)
{
  TF1 *f1 = new TF1("f1","gaus(0)+pol1(3)",0,10);
  TF1 *f2 = new TF1("f2","gaus(0)+pol2(3)",0,10);
  f1->SetParameters(1,3,0.5,2);
  f2->SetParameters(1,5,0.5,1,0.05,-0.01);
  f2->SetLineColor(4);
  f2->SetLineStyle(7);
  
  TH1F *h = new TH1F("h","h",100,0,10); 
  h->FillRandom("f2",10000);
  TGraphErrors *g = new TGraphErrors(100);
  for (int i=1;i<=100; ++i) {
    g->SetPoint(i-1, h->GetBinCenter(i),h->GetBinContent(i)); 
    g->SetPointError(i-1, 0, h->GetBinError(i));
  } 
  
  if (mode==0) {
    g->Fit("f2");
    g->Fit("f1","+");
    g->Draw("AP");
    
    g->GetListOfFunctions()->Add(f1);
    g->GetListOfFunctions()->Add(f2);
  }
  else {
    h->Fit("f2");
    h->Fit("f1","+");
    h->Draw();
    
    h->GetListOfFunctions()->Add(f1);
    h->GetListOfFunctions()->Add(f2);
  }
}

When running root -l -q 'graph_bug.C(1)' (i.e. with histogram), everything looks fine, but root -l -q graph_bug.C(0) gives:


 *** Break *** segmentation violation



===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0  0x00007f3e790f1bd3 in __GI___wait4 (pid=658515, stat_loc=stat_loc
entry=0x7ffd343e6368, options=options
entry=0, usage=usage
entry=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
#1  0x00007f3e790f1b97 in __GI___waitpid (pid=<optimized out>, stat_loc=stat_loc
entry=0x7ffd343e6368, options=options
entry=0) at ./posix/waitpid.c:38
#2  0x00007f3e7906a2fb in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:171
#3  0x00007f3e798fe93f in TUnixSystem::StackTrace() () from /cvmfs/eel.gsi.de/debian12-x86_64/root/636-00/lib/libCore.so
#4  0x00007f3e798fe2f4 in TUnixSystem::DispatchSignals(ESignals) () from /cvmfs/eel.gsi.de/debian12-x86_64/root/636-00/lib/libCore.so
#5  <signal handler called>
#6  0x0000000000000000 in ?? ()
#7  0x00007f3e7985625c in TList::RecursiveRemove(TObject*) () from /cvmfs/eel.gsi.de/debian12-x86_64/root/636-00/lib/libCore.so
#8  0x00007f3e5e0365d9 in TGraph::RecursiveRemove(TObject*) () from /cvmfs/eel.gsi.de/debian12-x86_64/root/636-00/lib/libHist.so
#9  0x00007f3e79856538 in TList::RecursiveRemove(TObject*) () from /cvmfs/eel.gsi.de/debian12-x86_64/root/636-00/lib/libCore.so
#10 0x00007f3e5d9b3ba8 in TPad::RecursiveRemove(TObject*) () from /cvmfs/eel.gsi.de/debian12-x86_64/root/636-00/lib/libGpad.so
#11 0x00007f3e79856538 in TList::RecursiveRemove(TObject*) () from /cvmfs/eel.gsi.de/debian12-x86_64/root/636-00/lib/libCore.so
#12 0x00007f3e7984a8cc in THashList::RecursiveRemove(TObject*) () from /cvmfs/eel.gsi.de/debian12-x86_64/root/636-00/lib/libCore.so
#13 0x00007f3e797efd62 in TROOT::RecursiveRemove(TObject*) () from /cvmfs/eel.gsi.de/debian12-x86_64/root/636-00/lib/libCore.so
#14 0x00007f3e797d5278 in TNamed::~TNamed() () from /cvmfs/eel.gsi.de/debian12-x86_64/root/636-00/lib/libCore.so
#15 0x00007f3e5e068e49 in TH1D::~TH1D() () from /cvmfs/eel.gsi.de/debian12-x86_64/root/636-00/lib/libHist.so
#16 0x00007f3e5dfb203d in TF1::~TF1() () from /cvmfs/eel.gsi.de/debian12-x86_64/root/636-00/lib/libHist.so
#17 0x00007f3e5dfb23e9 in TF1::~TF1() () from /cvmfs/eel.gsi.de/debian12-x86_64/root/636-00/lib/libHist.so
#18 0x00007f3e79853988 in TList::Delete(char const*) () from /cvmfs/eel.gsi.de/debian12-x86_64/root/636-00/lib/libCore.so
#19 0x00007f3e797f2105 in TROOT::EndOfProcessCleanups() () from /cvmfs/eel.gsi.de/debian12-x86_64/root/636-00/lib/libCore.so
#20 0x00007f3e798f78df in TUnixSystem::Exit(int, bool) () from /cvmfs/eel.gsi.de/debian12-x86_64/root/636-00/lib/libCore.so
#21 0x00007f3e7979e9ef in TApplication::Terminate(int) () from /cvmfs/eel.gsi.de/debian12-x86_64/root/636-00/lib/libCore.so
#22 0x00007f3e79b0a1e3 in TRint::Run(bool) () from /cvmfs/eel.gsi.de/debian12-x86_64/root/636-00/lib/libRint.so
#23 0x000055954280d1cf in main ()
===========================================================

Looks like the TGraph wants to delete the same object(s) twice leading to this segmentation fault. Interestingly, when commenting out one of the ...Add(...) statements for TGraph, I don’t get a crash. And this despite adding in both cases a TF1 that should already be in the ListOfFunctions.

Can somebody help me to understand this behaviour, or is this a bug? Background is that I’m writing a fitting tool that can deal with both histograms and graphs, and for including the background distribution, I’d like to add the TF1 of the background to the list of functions of the corresponding object. In principle it works, but I get this crash on quitting ROOT when using a TGraph.

Best regards and thanks, Klaus


ROOT Version: 6.36.00
Platform: Debian 12
Compiler: Not Provided


May be I missed something, but why do you need to add these two functions in the list of functions of the graph as they are already there after the Fit ? I simplified your example to:

void graph_bug()
{
   TF1 *f1 = new TF1("f1","gaus(0)+pol1(3)",0,10);
   TF1 *f2 = new TF1("f2","gaus(0)+pol2(3)",0,10);
   f1->SetParameters(1,3,0.5,2);
   f2->SetParameters(1,5,0.5,1,0.05,-0.01);


   TH1F *h = new TH1F("h","h",100,0,10);
   h->FillRandom("f2",10000);
   TGraphErrors *g = new TGraphErrors(100);
   for (int i=1;i<=100; ++i) {
      g->SetPoint(i-1, h->GetBinCenter(i),h->GetBinContent(i));
      g->SetPointError(i-1, 0, h->GetBinError(i));
   }

   g->Fit("f2");
   g->Fit("f1","+");

//    g->GetListOfFunctions()->Add(f1);
//    g->GetListOfFunctions()->Add(f2);
   g->GetListOfFunctions()->ls();
   g->Draw("AP");
}

I get:

Processing graph_bug.C...
****************************************
Minimizer is Minuit2 / Migrad
Chi2                      =      94.2926
NDf                       =           94
Edm                       =  2.89654e-06
NCalls                    =         1399
p0                        =      102.617   +/-   5.53783     
p1                        =      4.99968   +/-   0.0289038   
p2                        =     0.520189   +/-   0.0310969   
p3                        =      95.7058   +/-   3.03065     
p4                        =      2.96087   +/-   1.62556     
p5                        =    -0.745013   +/-   0.156272    
****************************************
Minimizer is Minuit2 / Migrad
Chi2                      =      115.872
NDf                       =           95
Edm                       =  7.38504e-08
NCalls                    =          651
p0                        =      107.715   +/-   5.232       
p1                        =      4.99123   +/-   0.0300038   
p2                        =     0.593995   +/-   0.0326688   
p3                        =      105.954   +/-   2.10331     
p4                        =     -4.63027   +/-   0.31189     
OBJ: TList	TList	Doubly linked list : 0
 OBJ: TF1	f2	gaus(0)+pol2(3) : 0 at: 0x7fa3d977bb20
 OBJ: TF1	f1	gaus(0)+pol1(3) : 0 at: 0x7fa3d97908e0
Info in <TCanvas::MakeDefCanvas>:  created default TCanvas with name c1

As you se the two functions are in the list depiste the fact I did not add them explicitly.

Thanks for taking a look. That example is not my use case, but the shortest case that generates this faulty behaviour. My actual programm is so complex that I can’t post it here. Nevertheless I think it is a bug, since TH1F can tidy up without crash…

May be @pcanal might have some idea about it.

Here a more reasonable example: I fit a function with signal+background to a histogram or graph and add the background function to the list of functions, so that when drawing the object, both, the full fit function and the background part are shown, and in particular saved when saving the object.

void graph_bug2(int mode=0)
{
  TF1 *f1 = new TF1("f1","gaus(0)+pol1(3)",0,10);
  TF1 *f2 = new TF1("f2","pol2(0)",0,10);
  f1->SetParameters(1,3,0.5,2);
  f2->SetLineColor(4);
  f2->SetLineStyle(7);
  
  TH1F *h = new TH1F("h","h",100,0,10); 
  h->FillRandom("f1",10000);
  TGraphErrors *g = new TGraphErrors(100);
  for (int i=1;i<=100; ++i) {
    g->SetPoint(i-1, h->GetBinCenter(i),h->GetBinContent(i)); 
    g->SetPointError(i-1, 0, h->GetBinError(i));
  } 
  
  if (mode==0) {
    g->Fit("f1");
    g->GetListOfFunctions()->Clear();
    f2->SetParameters(f1->GetParameters()+3);
    g->GetListOfFunctions()->Add(f1);
    g->GetListOfFunctions()->Add(f2);
    g->Draw();
  }
  else {
    h->Fit("f1");
    h->GetListOfFunctions()->Clear();
    f2->SetParameters(f1->GetParameters()+3);
    h->GetListOfFunctions()->Add(f1);
    h->GetListOfFunctions()->Add(f2);
    h->Draw();
  }
}

Here the same effect: root -l -q 'graph_bug2.C(0)' crashes, while root -l -q 'graph_bug2.C(1)' runs smoothly. Fun fact: When keeping the function from the original fit, i.e.

...
if (mode==0) {
    g->Fit("f1");
    f2->SetParameters(f1->GetParameters()+3);
    g->GetListOfFunctions()->Add(f2);
    g->Draw();
}
...

it runs without crashing.

Unfortunately this does not solve my particular problem, because my simultaneous fitting tool needs to propagate the fitted parameters to each fit object individually, so that I get a construction like above, where the individual object wasn’t actually fitted, but its function reconstructed from the simultaneous fit result.

@couet @pcanal Here’s the simplest “reproducer.cxx” (add two functions to a graph and then draw it in the traditional way):

{
  // root -b -l -q reproducer.cxx
  // root --web=off -l -q reproducer.cxx
  TF1 *f1 = new TF1("f1", "pol1", 0., 1.);
  f1->SetParameters(0.1, 1.);
  TF1 *f2 = new TF1("f2", "pol1", 0., 1.);
  f2->SetParameters(-0.1, 1.);
  TGraph *g = new TGraph(2);
  g->SetPoint(0, 0., 0.); g->SetPoint(1, 1., 1.);
  g->GetListOfFunctions()->Add(f1);
  g->GetListOfFunctions()->Add(f2);
  g->Draw("A*");
}

Another test: When adding two functions via Fit, the problems does not occure:

    g->Fit("f1");
    g->Fit("f2","+");

Hi,

In the code distilled by Wile, the problem is that the TF1 functions are being shared by the TGraph and the global list of functions (gROOT->GetListOfFunctions())

However the mechanism to properly handles this co-ownership is not activated for the TF1 objects.

The crash happens during the tear-down clean up. At the end of the process gROOT delete the list of functions and thus f1 and f2 but does not inform the TGraph that are are deleted. Consequently when some other code triggers a scan through the TGraph there is a crash.

Since the TGraph is already indirectly connected the the ListOfCleanups, all that is needed to activate the auto cleanup for the TF1 is:

    f1->SetBit(kMustCleanup);
    f2->SetBit(kMustCleanup);

However the more stable and streamlined way (used by the Fitting code) is to tell the function who their co Parent/Owner is:

    f1->SetParent(g);
    f2->SetParent(g);

Cheers,
Philippe.