Unbinned Log Likelihood fit -- not working well

SuperCiocia · March 31, 2015, 2:28am

I’m importing data from a class, filling three distinct histograms. Then, I’m fitting functions to these histograms such that they share some of the parameters, but not all*. So I have a combined triple fit over the three histograms.
Initially I was performing a chi square fit, and it worked.
Then, I attempted a Log Likelihood fit, and things started to go wrong. At least one of the three fits does not look good, one of parameters is way off and it makes the whole combined triple fit diverge. There are no error messages though.

I am including a macro of the code. Instead of the actual data, I am filling the histograms at random so that the code can be run on its own with .x LLtrial.C+
In the code, I am including a way to revert to the chi square fit: just comment out the bits that I have indicated between lines 96 and 119 – the chi square fit works.

[code]#include “Fit/Fitter.h”
#include “Fit/BinData.h”
#include “Fit/UnBinData.h”
#include “Fit/LogLikelihoodFCN.h”
#include “Fit/Chi2FCN.h”
#include “Fit/FitResult.h”
#include “TH1.h”
#include “TStyle.h”
#include “TCanvas.h”
#include “TMath.h”
#include “TAxis.h”
#include “TList.h”
#include “Math/WrappedMultiTF1.h”
#include “HFitInterface.h”
#include “THStack.h”
#include “TLegend.h”
#include “TGraph.h”
#include “TROOT.h”
#include

// declarations of functions used in the fits (defined after main functions)
Double_t background(Double_t*, Double_t*);
Double_t crystal(Double_t*,Double_t*);
Double_t norm_ratiofit(Double_t*, Double_t*);
Double_t ratiofit(Double_t*, Double_t*);
Double_t norm_noBGratiofit(Double_t*, Double_t*);
Double_t noBGratiofit(Double_t*, Double_t*);

// need these to normalise the fit functions later:
TF1 *normfn = new TF1(“normfn”,ratiofit,4100,6800,11);
TF1 *noBG_normfn = new TF1(“noBG_normfn”,noBGratiofit,5100,5400,8);
Double_t Bsnorm, Bdnorm, Cnorm;

// Global calculates the quantity to minimise (e.g. chi square) for each of the three histograms
struct Global{
Global(ROOT::Math::IMultiGenFunction & f1, ROOT::Math::IMultiGenFunction & f2, ROOT::Math::IMultiGenFunction & f3) :
f_1(&f1), f_2(&f2), f_3(&f3) {} // f_sth is the quantity to minimise, like chi square. Here it should be the likelihood, and should be maximised…?

// parameter vectors for the fit to each histogram
double operator() (const double *r) const {
double ps[12] = {r[0],r[1],r[2],r[3],r[4],r[4]+86.77,r[18],r[19],r[20],r[5],r[21],1}; // the last 1 is just being initialised here. A few lines below give it its correct value
double pd[12] = {r[6],r[7],r[8],r[9],r[10],r[10]+86.77,r[18],r[19],r[20],r[11],r[21],1};
double pc[9] = {r[12],r[13],r[13]+86.77,r[14],r[16],r[17],r[15],r[21],1};
normfn->SetParameters(ps);
Bsnorm = normfn->Integral(4100,6800);
ps[11] = Bsnorm;
normfn->SetParameters(pd);
Bdnorm = normfn->Integral(4100,6800);
pd[11] = Bdnorm;
noBG_normfn->SetParameters(pc);
Cnorm = noBG_normfn->Integral(5100,5400);
pc[8] = Cnorm;
return ((*f_1)(ps) + (*f_2)(pd) + (*f_3)(pc));
}
const ROOT::Math::IMultiGenFunction * f_1;
const ROOT::Math::IMultiGenFunction * f_2;
const ROOT::Math::IMultiGenFunction * f_3;
};

void LLtrial(){

TCanvas *c1 = new TCanvas(“c1”,“Fit”);
c1->Divide(1,3);

// defining fits to fill histograms with

TF1 *refitBs = new TF1(“refitBs”,ratiofit,4100,6800,11);
Double_t parBs[12] = {13.3681,-0.00226713,0.954221,50.406,5278.05,5278.05+86.77,96.3405,1.25787,1.75736,2.01963,0.145621,1};
refitBs->SetParameters(parBs);

TF1 *refitBd = new TF1(“refitBd”,ratiofit,4100,6800,11);
Double_t parBd[12] = {13.7442,-0.00219231,0.193871,78.3992,5293.38,5293.38+86.77,96.3405,1.25787,1.75736,0.58826,0.145621,1};
refitBd->SetParameters(parBd);

TF1 *refitC = new TF1(“refitC”,noBGratiofit,5100,5400,8);
Double_t parC[9] = {9.24112,5269.11,5269.11+86.77,26.9198,0.620423,2.94225,1,0.145621,1};
refitC->SetParameters(parC);

// filling random histograms
TH1F *sel = new TH1F(“sel”, “Bs random filled”, 150, 4100, 6800);
sel->FillRandom(“refitBs”,1929);
TH1F *hist = new TH1F(“hist”, “B0 random filled”, 150, 4100, 6800);
hist->FillRandom(“refitBd”,3465);
TH1F *Converted = new TH1F(“Converted”, “converted T5 random filled”, 60, 5100, 5400);
Converted->FillRandom(“refitC”,110);

// starting combined Log likelihood fit between the two histograms

ROOT::Math::WrappedMultiTF1 wBs(*refitBs,1);
ROOT::Math::WrappedMultiTF1 wBd(*refitBd,1);
ROOT::Math::WrappedMultiTF1 wC(*refitC,1);

ROOT::Fit::DataOptions opt;

ROOT::Fit::DataRange rangeBs;
rangeBs.SetRange(4100,6800);
ROOT::Fit::UnBinData dataBs(opt,rangeBs,sel->GetEntries()); // comment out if using chi2
// ROOT::Fit::BinData dataBs(opt,rangeBs); // comment out if using LL
// ROOT::Fit::FillData(dataBs, sel); // comment out if using LL

ROOT::Fit::DataRange rangeBd;
rangeBd.SetRange(4100,6800);
ROOT::Fit::UnBinData dataBd(opt,rangeBd,hist->GetEntries()); // comment out if using chi2
// ROOT::Fit::BinData dataBd(opt,rangeBd); // comment out if using LL
// ROOT::Fit::FillData(dataBd, hist); // comment out if using LL

ROOT::Fit::DataRange rangeC;
rangeC.SetRange(5100,5400);
ROOT::Fit::UnBinData dataC(opt,rangeC,Converted->GetEntries()); // comment out if using chi2
// ROOT::Fit::BinData dataC(opt,rangeC); // comment out if using LL
// ROOT::Fit::FillData(dataC, Converted); // comment out if using LL

// ROOT::Fit::Chi2Function LL_Bs(dataBs, wBs); // comment out if using LL
// ROOT::Fit::Chi2Function LL_Bd(dataBd, wBd); // comment out if using LL
// ROOT::Fit::Chi2Function LL_C(dataC, wC); // comment out if using LL

ROOT::Fit::LogLikelihoodFunction LL_Bs(dataBs, wBs); // comment out if using chi2
ROOT::Fit::LogLikelihoodFunction LL_Bd(dataBd, wBd); // comment out if using chi2
ROOT::Fit::LogLikelihoodFunction LL_C(dataC, wC); // comment out if using chi2

Global globalLL(LL_Bs, LL_Bd, LL_C); // Log Likelihood or chi2

ROOT::Fit::Fitter fitter;

// number of parameters in total (for both fits) are 22, some are shared though: initialising them here
Double_t par0[22] = {13.3681,-0.00226713,0.954221,50.406,5278.05,2.01963,13.7442,-0.00219231,0.193871,78.3992,5293,0.58826,9.24112,5269.11,26.9198,1,0.620423,2.94225,96.3405,1.25787,1.75736,0.145621};

// create before the parameter settings in order to fix or set range on them + give them a name
fitter.Config().SetParamsSettings(22,par0);
fitter.Config().ParSettings(5).Fix();
fitter.Config().ParSettings(11).Fix();
//fitter.Config().ParSettings(12).SetLimits(0,1000);
fitter.Config().ParSettings(13).SetLimits(5259,5299);
fitter.Config().ParSettings(15).Fix();
fitter.Config().ParSettings(16).Fix();
fitter.Config().ParSettings(17).Fix();
fitter.Config().ParSettings(19).Fix();
fitter.Config().ParSettings(20).Fix();
fitter.Config().ParSettings(0).SetName("Bs Expo Amp ");
fitter.Config().ParSettings(1).SetName("Bs Expo Coef ");
fitter.Config().ParSettings(2).SetName("Bs Constant ");
fitter.Config().ParSettings(3).SetName("Bs Norm ");
fitter.Config().ParSettings(4).SetName("Bs Bd Mean ");
fitter.Config().ParSettings(5).SetName("Bs k ");
fitter.Config().ParSettings(6).SetName("Bd Expo Amp ");
fitter.Config().ParSettings(7).SetName("Bd Expo Coef ");
fitter.Config().ParSettings(8).SetName("Bd Constant ");
fitter.Config().ParSettings(9).SetName("Bd Norm ");
fitter.Config().ParSettings(10).SetName("Bd Bd Mean ");
fitter.Config().ParSettings(11).SetName("Bd k ");
fitter.Config().ParSettings(12).SetName("C Norm ");
fitter.Config().ParSettings(13).SetName("C Bd Mean ");
fitter.Config().ParSettings(14).SetName("C Sigma ");
fitter.Config().ParSettings(15).SetName("C k ");
fitter.Config().ParSettings(16).SetName("C alpha ");
fitter.Config().ParSettings(17).SetName("C n ");
fitter.Config().ParSettings(18).SetName(“Bs Bd Sigma”);
fitter.Config().ParSettings(19).SetName("Bs Bd Alpha ");
fitter.Config().ParSettings(20).SetName("Bs Bd n ");
fitter.Config().ParSettings(21).SetName("r ");

fitter.Config().MinimizerOptions().SetErrorDef(0.5);
fitter.Config().MinimizerOptions().SetPrintLevel(3);

fitter.FitFCN(22,globalLL,par0,dataBs.Size()+dataBd.Size(),false); // performs the fit
ROOT::Fit::FitResult result = fitter.Result();
result.Print(std::cout, true);
cout<<"Min FCN value: "<<2result.MinFcnValue()<<endl;
cout<<"Min FCN/DoF value: "<<2result.MinFcnValue()/double(result.Ndf())<<endl; //Ndf = number of degrees of freedom, MinFcnValue = I think it’s something to do with the goodness of the fit, it was the chi square back when we were not using Log Likelihood

Double_t r[22];
for(Int_t i=0;i<22;i++)r[i]=result.Parameter(i); // getting the parameters from the fit

// want to draw the individual components of the total fit (crystal ball + exponential) for display purposes and also to see what went wrong
// so here we are setting the parameters for the two total fits, and later we are drawing them along with their components
double Bspar[11] = {r[0],r[1],r[2],r[3],r[4],r[4]+86.77,r[18],r[19],r[20],r[5],r[21]};
double Bdpar[11] = {r[6],r[7],r[8],r[9],r[10],r[10]+86.77,r[18],r[19],r[20],r[11],r[21]};
double Cpar[8] = {r[12],r[13],r[13]+86.77,r[14],r[16],r[17],r[15],r[21]};

c1->cd(1);
sel->Draw();
TF1 *DratioBs = new TF1(“DratioBs”,ratiofit,4100,6800,11);
DratioBs->SetParameters(&Bspar[0]);
DratioBs->SetLineColor(kMagenta);
DratioBs->Draw(“same”);
TF1 *BsBackG = new TF1(“BsBackG”,background,4100,6800,3);
Double_t BsBG[3] = {r[0],r[1],r[2]};
BsBackG->SetParameters(&BsBG[0]);
BsBackG->SetLineColor(kGreen);
BsBackG->Draw(“same”);
TF1 *BsDisplay1 = new TF1(“BsDisplay1”,crystal,4100,6800,5);
Double_t BsD1[5] = {r[3],r[4],r[18],r[19],r[20]};
BsDisplay1->SetParameters(&BsD1[0]);
BsDisplay1->SetLineColor(kRed);
BsDisplay1->Draw(“same”);
TF1 *BsDisplay2 = new TF1(“BsDisplay2”,crystal,4100,6800,5);
Double_t BsD2[5] = {r[3]*r[5]*r[21],r[4]+86.77,r[18],r[19],r[20]};
BsDisplay2->SetParameters(&BsD2[0]);
BsDisplay2->SetLineColor(kBlue);
BsDisplay2->Draw(“same”);

c1->cd(2);
hist->Draw();
TF1 *DratioBd = new TF1(“DratioBd”,ratiofit,4100,6800,11);
DratioBd->SetParameters(&Bdpar[0]);
DratioBd->SetLineColor(kMagenta);
DratioBd->Draw(“same”);
TF1 *BdBackG = new TF1(“BdBackG”,background,4100,6800,3);
Double_t BdBG[3] = {r[6],r[7],r[8]};
BdBackG->SetParameters(&BdBG[0]);
BdBackG->SetLineColor(kGreen);
BdBackG->Draw(“same”);
TF1 *BdDisplay1 = new TF1(“BdDisplay1”,crystal,4100,6800,5);
Double_t BdD1[5] = {r[9],r[10],r[18],r[19],r[20]};
BdDisplay1->SetParameters(&BdD1[0]);
BdDisplay1->SetLineColor(kRed);
BdDisplay1->Draw(“same”);
TF1 *BdDisplay2 = new TF1(“BdDisplay2”,crystal,4100,6800,5);
Double_t BdD2[5] = {r[9]*r[11]*r[21],r[10]+86.77,r[18],r[19],r[20]};
BdDisplay2->SetParameters(&BdD2[0]);
BdDisplay2->SetLineColor(kBlue);
BdDisplay2->Draw(“same”);

c1->cd(3);
Converted->Draw();
TF1 *DratioC = new TF1(“DratioC”,noBGratiofit,5100,5400,8);
DratioC->SetParameters(&Cpar[0]);
DratioC->SetLineColor(kMagenta);
DratioC->Draw(“same”);
TF1 *CDisplay1 = new TF1(“CDisplay1”,crystal,5100,5400,5);
Double_t CD1[5] = {r[12],r[13],r[14],r[16],r[17]};
CDisplay1->SetParameters(&CD1[0]);
CDisplay1->SetLineColor(kRed);
CDisplay1->Draw(“same”);
TF1 *CDisplay2 = new TF1(“CDisplay2”,crystal,5100,5400,5);
Double_t CD2[5] = {r[12]*r[15]*r[21],r[13]+86.77,r[14],r[16],r[17]};
CDisplay2->SetParameters(&CD2[0]);
CDisplay2->SetLineColor(kBlue);
CDisplay2->Draw(“same”);

}

//Fit using expo+const background and two crystal functions
//Par: expo{0 amplitude, 1 expo coef}, 2 const, 3 signal norm, 4 Bd mean, 5 Bs mean (dummy), 6 signal sigma, 7 alpha, 8 n, 9 k (fixed), 10 r}
// k is the enhancement/diminishment efficiency
// r is the suppression ratio
//Par2: crystal1 {0 Bd norm (=1), 1 Bd mean, 2 Bd sigma, 3 alpha, 4 n},
// crystal2 {5 Bs norm (=1), 6 Bs mean (Bd mean+86.77), 7 Bd sigma, 8 alpha, 9 n}
Double_t ratiofit(Double_t *x, Double_t par){
Double_t par2[10];
par2[0]=1; par2[5]=1;
par2[1]=par[4]; par2[6]=par[4]+86.77;
for (Int_t i=0;i<3;i++){
par2[i+2]=par[i+6];
par2[i+7]=par[i+6];
}
return background(x,par)+par[3](crystal(x,par2)+(par[9]*par[10]*crystal(x,&par2[5])));
}

//Normalised version of the ratiofit function.
//Par: 0 to 10 are part of the ratiofit, 11 is the norm constant for the current params
Double_t norm_ratiofit(Double_t *x, Double_t *par){
return ratiofit(x,par)/par[11];
}

//Par: 0 signal norm, 1 Bd mean, 2 Bs mean (dummy), 3 signal sigma, 4 alpha, 5 n, 6 k (fixed), 7 r
// k is the enhancement/diminishment of the Bs signal relative to the Bd signal due to tagging
// r is the ratio of Bs->Ky to Bd->Ky
//Par2: crystal1 {0 Bd norm (=1), 1 Bd mean, 2 Bd sigma, 3 alpha, 4 n},
// crystal2 {5 Bs norm (=1), 6 Bs mean (Bd mean+86.77), 7 Bs sigma, 8 alpha, 9 n}
Double_t noBGratiofit(Double_t *x, Double_t par){
Double_t par2[10];
par2[0]=1; par2[5]=1;
par2[1]=par[1]; par2[6]=par[1]+86.77;
for (Int_t i=0;i<3;i++){
par2[i+2]=par[i+3];
par2[i+7]=par[i+3];
}
return par[0](crystal(x,par2)+(par[6]*par[7]*crystal(x,&par2[5])));
}

//Normalised version of the no BG ratiofit function.
//Par: 0 to 7 are part of the ratiofit, 8 is the norm constant for the current parameters
Double_t norm_noBGratiofit(Double_t *x, Double_t *par){
return noBGratiofit(x,par)/par[8];
}

//Backgournd exponential with constant vertical shift.
//Par: expo{0 amplitude, 1 exponent coef}, 2 constant
Double_t background(Double_t *x, Double_t *par){
return (exp(par[0]+par[1]*x[0])+par[2]);
}

//Crystal Ball function for fitting signal.
//Par: 0 normalisation(N), 1 mean, 2 sigma, 3 alpha, 4 n
Double_t crystal(Double_t x, Double_t par){
Double_t s = fabs(par[2]);
Double_t t = (x[0]-par[1])/s;
if (par[3]<0) t =-t;
Double_t a = fabs((Double_t)par[3]);
Double_t n = par[4];
if (t>=-par[3]) return par[0]exp(-0.5tt);
else{
Double_t B = pow((n/a),n)exp(-0.5aa);
Double_t C = (n/a)-a;
return (par[0]*B)/pow((C-t),n);
}
}
[/code]

*(knowledge of what the fits do and what they are made of, along with which parameters are shared, should not be relevant to the problem but I can explain further if necessary).

moneta · April 1, 2015, 3:16pm

Hi,

Did you renormalise correctly the functions ?
I see also that you did not use the extdended options for the likelihood. I think you would need a global extended term for the full likelihood for the fluctuations of the total number of events. Otherwise the likelihood fit has one parameter less than the chi-square and then in your case it won’t work because you have one parameter more.

Cheers

Lorenzo

SuperCiocia · April 2, 2015, 1:44pm

Dear Lorenzo,

thanks for the answer.

As far as the normalisation of the functions is concerned:
In the constructor Global, we have two functions “normfn” and “noBG_normfn” that are used to compute the integral of the fit (fit functions = “ratiofit” and “noBGratiofit”) in the range of interest. The integral is then stored in the last parameter of “ps”, “pd” and “pc”, that are gonna be the parameters of the shared fits among the histograms. When performing the triple combined fit, we are calling the “norm_ratiofit” and “norm_noBGratiofit” functions, that are defined to be “ratiofit/last parameter” and “noBGratiofit/last parameter”, where the ‘last parameter’ is the integral of the fit function, as explained earlier.
Does it seem correct?

And as far the global extended term is concerned:
What do you mean, physically, by ‘fluctuation of total number of events’ and why would this only be relevant for the LL fit and not for the chi square? What would be the extra parameter?
Currently we are using the FitFCN method to perform the fit, which uses IMultiGenFunction that we already knew how to use from the chi square fits. Is there a simple way of incorporating the extended term of the full likelihood here? Or should we change the structure of the code completely?

moneta · April 2, 2015, 8:29pm

Hi,

As I said the a non-extended likelihood fit fits only the shape of the function and not its normalisation. You need to add an extra term in the likelihood , i.e. a Poisson(total number of event observed | total number of event expected) to fit the normalisation.
A chi2 fit uses functions and not pdf and therefore fits automatically the normalization. You can find about this in any book of statistics, my obvious preference is for the description of extended fits in chapter 2 of
amazon.com/Data-Analysis-Hig … 3527410589.

For adding the extended option, you could use the appropriate flag in the constructor of

project-mathlibs.web.cern.ch/pro … fa1183c260

But, since you are creating yourself a Global likelihood you could also add easily yourself the Poisson term in the implementation of the Global likelihood function.

Best Regards

Lorenzo