SIGSEGV with TF1 in compiled C++ program

davide84 · April 26, 2017, 8:48am

Dear Rooters,

I’m asking your help to understand a crash in my program and comment on my attempt to fix it.

This is the symptom:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007feeb4d22940 in TList::FindObject(TObject const*) const () from /usr/lib/x86_64-linux-gnu/libCore.so.5.34
(gdb) bt
#0  0x00007feeb4d22940 in TList::FindObject(TObject const*) const () from /usr/lib/x86_64-linux-gnu/libCore.so.5.34
#1  0x00007feeb4d268a8 in TObjArray::Delete(char const*) () from /usr/lib/x86_64-linux-gnu/libCore.so.5.34
#2  0x00007feeb459c148 in TFormula::ClearFormula(char const*) () from /usr/lib/x86_64-linux-gnu/libHist.so.5.34
#3  0x00007feeb459c38e in TFormula::Compile(char const*) () from /usr/lib/x86_64-linux-gnu/libHist.so.5.34
#4  0x00007feeb459d7b9 in TFormula::TFormula(char const*, char const*) () from /usr/lib/x86_64-linux-gnu/libHist.so.5.34
#5  0x00007feeb4575efe in TF1::TF1(char const*, char const*, double, double) () from /usr/lib/x86_64-linux-gnu/libHist.so.5.34
#6  0x00000000005b27bb in calcPoissonThreshold(counts=counts@entry=30.930034918824752, far=2.8571428571428574e-05)

And this is how the code looked like:

double calcPoissonThreshold(double counts, double far)
{
  double threshold = 0.0;
  if (counts < 0 || far <= 0.0 || far >= 1.0) {
    // log some warning
  } else if (counts >= 0) {
    const double distr_max = 2*counts + 200; // experimentally optimized
    TF1 f(
        "f",
        Form("ROOT::Math::poisson_cdf_c(x,%f)",TMath::Ceil(counts)),
        0,
        distr_max
      );
    f.SetNpx(1000);
    threshold = f.GetX(far) + 1;
  }
  return threshold;
}

this function is supposed to be called multiple times with different parameters, but in a sequential order, no multithreading.

My first idea was to have a better name than “f” because it may cause conflicts. But as I said the order is sequential and I didn’t get any “Replacing existing object …” warning message on stdout, which is typical when you recycle a name in the wrong way.

Then I dig into documentation. I am using ROOT 5.34/14, the version shipped with Ubuntu 14.04. If I understand correctly, every time a TF1 is created a TFormula is created as well, which is the “real” object doing the magic. See last lines of contructor here: https://root.cern.ch/root/html534/src/TFormula.cxx.html#eJktvE. The TFormula is added to an internal list of objects, removing something with the same name if already existing.
The segfault I have may be related to ROOT attempting to delete twice the same object, one in TFormula cleanup and one in the TF1 destructor, they may be competing for some reasons.

I then found this bug issue https://sft.its.cern.ch/jira/browse/ROOT-8089 which is not explaining my issue but gave me ideas about possible fixes. I wrote this:

#include <TROOT.h>         // gRoot
#include <TVirtualMutex.h> // R__LOCKGUARD2

// ...

class UnregisteredTF1 {
public:
  UnregisteredTF1(const char *name, const char *formula, Double_t xmin, Double_t xmax)
  {
    R__LOCKGUARD2(gROOTMutex);
    mF1 = new TF1(name, formula, xmin, xmax);
    TFormula *form_obj = (TFormula*)gROOT->GetListOfFunctions()->FindObject(name);
    if (form_obj) {
      gROOT->GetListOfFunctions()->Remove(form_obj);
    }
    form_obj->SetBit(TFormula::kNotGlobal, 1);
  }
  ~UnregisteredTF1()
  {
    mF1->Delete();
  }
  void     SetNpx(Int_t npx) { mF1->SetNpx(npx);    }
  Double_t GetX(Double_t y)  { return mF1->GetX(y); }
  Double_t Eval(Double_t x)  { return mF1->Eval(x); }
private:
  TF1 *mF1;
};

EDIT: the segfault is now happening in another part of my program, same backtrace (TFormula etc) but this time TF1 is used this way:

      §
      // fit a gaussian to the peak
      TF1 fitfun("fit","gaus",fitXMin,fitXMax);
      fitfun.SetParameters(1, mean);
      h_deconv.Fit(&fitfun, "QRNO", "", fitXMin, fitXMax);
      // get parameters from "fitfun", then never use it again until end of scope

I think I should better understand what I am doing…

Could the TFormula behavior be the explanation for my crash?
Is it wrong that I am using TF1 not as a pointer?
Should I generate a unique name for my functions?

Thank you in advance!

Danilo · April 26, 2017, 2:34pm

Hi,

a lot of work has been carried out to make ROOT thread safe and able to help expression of parallelism. All these new features are part of ROOT6 and will not be ported to ROOT5. Is it possible that you move to ROOT6?
A first step then would consist in enabling thread saftety with ROOT::EnableThreadSafety() (https://root.cern.ch/doc/v608/namespaceROOT.html#a3332c2f629881ab608768fa6846f440e)

Cheers,
D

davide84 · April 26, 2017, 2:51pm

Hi,

Unfortunately I cannot move to ROOT6, it does not even compile on my OS (Ubuntu 14.04) because I keep getting an error about CMake minimum version. I plan to do it ASAP but anyways it cannot be done at the moment.

Danilo · April 26, 2017, 7:14pm

Davide,

Ubuntu 14.04 is one of our testing platforms. Is it a problem to upgrade CMake?

Cheers,
D

davide84 · April 27, 2017, 10:27am

Hi,

at the moment that is not really an option, we have the constraint to stick to the packages. I could update a single machine for testing, but then I would have to test many many other things before propagating this.

So, you think it may actually be a parallelization issue…?

Danilo · April 27, 2017, 10:31am

Hi,

This is what the stacktrace seems to suggest. Perhaps is worth trying to invoke “TThread::Init()” (https://root.cern.ch/doc/master/classTThread.html#acac75234a72945b1a94eae0da2f46690). Some thread safety is there in ROOT5 too and needs to be activated with the aforementioned command.

Cheers,
D

davide84 · May 2, 2017, 1:17pm

Hi, the last suggestion seems to have fixed the problem… well, at least I am not experiencing any more crashes… thanks!

system · May 16, 2017, 1:18pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.