Likelihood for evaluating goodness of fit

ody_p · March 2, 2018, 3:49pm

Hi all.

At my work place we have the problem of trying to fit a function to a histogram. Instead of trying just one functional form, we try several different ones and eventually try to determine which one gives the best fit.

The fitting is done by calling the [TH1F::Fit method]

We use the “L” option for the fit, i.e. LogLikelihood. After the fit, we would like to get the likelihood from the FitResult and calculate some metric that indicates goodness of fit. Instead of using the likelihood itself, we were thinking of using some metric that gives a penalty for high number of parameters such as AIC. We are using FitResult::MinFcnValue method for getting the likelihood (according to this thread’s suggestions).

However we don’t understand what the value returned by MinFcnValue represents. If it was the log-likelihood, we would expect it to increase as the fit improves. We have a very basic example (see below) that demonstrates that the highest MinFcnValue is given to the worst fit, so obviously doesn’t return the log-likelihood but some other quantity that somehow relates to it. Can you explain what is the value returned and how we can get the true log-likelihood?

Please see below a self contained example of how we are trying to get the likelihood.

#include
#include

#pragma warning (push)
#pragma warning (disable : 4800)
#include “TCanvas.h”
#include “TH1F.h”
#include “TFile.h”
#include “TF1.h”
#include “TMath.h”
#include “TFitResult.h”

#pragma warning (pop)

// Create a histogram with artificial data
std::unique_ptr CreateTestHistogram()
{
std::unique_ptr h1(new TH1F(“testHistogram”, " ", 10, 0.0, 1.0));

h1->SetBinContent(1, 1.0);
h1->SetBinContent(2, 2.0);
h1->SetBinContent(3, 4.0);
h1->SetBinContent(4, 10.0);
h1->SetBinContent(5, 15.0);
h1->SetBinContent(6, 12.0);
h1->SetBinContent(7, 9.0);
h1->SetBinContent(8, 6.0);
h1->SetBinContent(9, 3.0);
h1->SetBinContent(10, 0.0);

TFile histogram_file(“testHistogram.root”, “update”, “meas histo”);
histogram_file.WriteTObject(h1.get());
return h1;
}

// Different functional forms to try fitting
const std::vectorstd::string formulas{
std::string(“exp([0] + [1](x-0.5))"),
std::string("exp([0] + [1](x-0.5) + [2] * (x - 0.5)**2)”),
std::string(“exp([0] + [1](x-0.5) + [2] * (x - 0.5)**2 + [3] * (x - 0.5)**3)"),
std::string("exp([0] + [1](x-0.5) + [2] * (x - 0.5)**2 + [3] * (x - 0.5)**3 + [4] * (x - 0.5)**4)”),
std::string(“exp([0] + [1]*(x-0.5) + [2] * (x - 0.5)**2 + [3] * (x - 0.5)**3 + [4] * (x - 0.5)**4 + [5] * (x - 0.5)**5)”),
};

int main(void)
{
auto histogram = CreateTestHistogram();

TCanvas c2(“c2”, “c2222”, 700, 500);
for (size_t i = 0; i < 5; i++)
{
std::string filename = std::string(“maxLikelihoodFitTest”) + std::to_string(i+1) + “.root”;
TFile histoFit_file(filename.c_str(), “update”, “histo fit”);
TF1* f1 = new TF1(“fitFunc”, formulas[i].c_str(), 0.0, 2.5);
TFitResultPtr result = histogram->Fit(f1,“L S”); // fit using log-likelihood
std::cout << "result->IsValid(): " << result->IsValid() << “\n”;
std::cout << "result->MinFcnValue(): " << result->MinFcnValue() << “\n”; // try to retrieve the likelihood.
histogram->SetTitle(filename.c_str());
histoFit_file.WriteTObject(histogram.get());
}

return 0;
}

Dilicus · March 4, 2018, 12:03pm

Hi,
MinFcnValue returns minus the value of the LogLikelihood.
Multiplying it by 2 give you a the Chi2, but from the statistical point of view is valid only for histogram with large statistics.

Cheers,
Stefano

system · March 18, 2018, 12:03pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.