Dear Rooters,
I’m asking your help to understand a crash in my program and comment on my attempt to fix it.
This is the symptom:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007feeb4d22940 in TList::FindObject(TObject const*) const () from /usr/lib/x86_64-linux-gnu/libCore.so.5.34
(gdb) bt
#0 0x00007feeb4d22940 in TList::FindObject(TObject const*) const () from /usr/lib/x86_64-linux-gnu/libCore.so.5.34
#1 0x00007feeb4d268a8 in TObjArray::Delete(char const*) () from /usr/lib/x86_64-linux-gnu/libCore.so.5.34
#2 0x00007feeb459c148 in TFormula::ClearFormula(char const*) () from /usr/lib/x86_64-linux-gnu/libHist.so.5.34
#3 0x00007feeb459c38e in TFormula::Compile(char const*) () from /usr/lib/x86_64-linux-gnu/libHist.so.5.34
#4 0x00007feeb459d7b9 in TFormula::TFormula(char const*, char const*) () from /usr/lib/x86_64-linux-gnu/libHist.so.5.34
#5 0x00007feeb4575efe in TF1::TF1(char const*, char const*, double, double) () from /usr/lib/x86_64-linux-gnu/libHist.so.5.34
#6 0x00000000005b27bb in calcPoissonThreshold(counts=counts@entry=30.930034918824752, far=2.8571428571428574e-05)
And this is how the code looked like:
double calcPoissonThreshold(double counts, double far)
{
double threshold = 0.0;
if (counts < 0 || far <= 0.0 || far >= 1.0) {
// log some warning
} else if (counts >= 0) {
const double distr_max = 2*counts + 200; // experimentally optimized
TF1 f(
"f",
Form("ROOT::Math::poisson_cdf_c(x,%f)",TMath::Ceil(counts)),
0,
distr_max
);
f.SetNpx(1000);
threshold = f.GetX(far) + 1;
}
return threshold;
}
this function is supposed to be called multiple times with different parameters, but in a sequential order, no multithreading.
My first idea was to have a better name than “f” because it may cause conflicts. But as I said the order is sequential and I didn’t get any “Replacing existing object …” warning message on stdout, which is typical when you recycle a name in the wrong way.
Then I dig into documentation. I am using ROOT 5.34/14, the version shipped with Ubuntu 14.04. If I understand correctly, every time a TF1 is created a TFormula is created as well, which is the “real” object doing the magic. See last lines of contructor here: https://root.cern.ch/root/html534/src/TFormula.cxx.html#eJktvE. The TFormula is added to an internal list of objects, removing something with the same name if already existing.
The segfault I have may be related to ROOT attempting to delete twice the same object, one in TFormula cleanup and one in the TF1 destructor, they may be competing for some reasons.
I then found this bug issue https://sft.its.cern.ch/jira/browse/ROOT-8089 which is not explaining my issue but gave me ideas about possible fixes. I wrote this:
#include <TROOT.h> // gRoot
#include <TVirtualMutex.h> // R__LOCKGUARD2
// ...
class UnregisteredTF1 {
public:
UnregisteredTF1(const char *name, const char *formula, Double_t xmin, Double_t xmax)
{
R__LOCKGUARD2(gROOTMutex);
mF1 = new TF1(name, formula, xmin, xmax);
TFormula *form_obj = (TFormula*)gROOT->GetListOfFunctions()->FindObject(name);
if (form_obj) {
gROOT->GetListOfFunctions()->Remove(form_obj);
}
form_obj->SetBit(TFormula::kNotGlobal, 1);
}
~UnregisteredTF1()
{
mF1->Delete();
}
void SetNpx(Int_t npx) { mF1->SetNpx(npx); }
Double_t GetX(Double_t y) { return mF1->GetX(y); }
Double_t Eval(Double_t x) { return mF1->Eval(x); }
private:
TF1 *mF1;
};
EDIT: the segfault is now happening in another part of my program, same backtrace (TFormula etc) but this time TF1 is used this way:
§
// fit a gaussian to the peak
TF1 fitfun("fit","gaus",fitXMin,fitXMax);
fitfun.SetParameters(1, mean);
h_deconv.Fit(&fitfun, "QRNO", "", fitXMin, fitXMax);
// get parameters from "fitfun", then never use it again until end of scope
I think I should better understand what I am doing…
Could the TFormula behavior be the explanation for my crash?
Is it wrong that I am using TF1 not as a pointer?
Should I generate a unique name for my functions?
Thank you in advance!