Hi All,
I am trying to do a compatibility study between some data and the associated Monte Carlo. I have done the Kolmogorov Test and I would like to compare these results to a likelihood approach.
This is the approach I am using:
I have two histograms. One is the data with errors and the other is a Monte Carlo with errors. They contain event counts in custom binning. The number of events is quite low (~500) and the errors are quite large on the data.
This is the code I used for the data:
//dataEvents is a TH1D that consists of 8 bins and event counts with errors.
double dataEventCount = dataEvents->Integral();
RooRealVar x("x", "x", 0, 200);
RooDataHist data("data", "data", x, dataEvents);
RooHistPdf pdfData("pdf", "pdf", x, data, 0);
RooAbsReal* nllData = pdfData.createNLL(data);
I do something similar for the MC but I generate Toys (number of MC events from histogram) and repeat multiple times to get a distribution of nll values. I fill a histogram “ts” with the nll values. With the goal of integrating from the nllData value to the end to give me a P value.
//mcHist is a TH1D that consists of 8 bins and event counts with errors.
double mcEventCount = mcEvents->Integral();
RooHistPdf pdfMc("pdf1", "pdf1", x, mcHist, 0);
TH1D* ts = new TH1D("MC-likelihood-Dist", "MC-likelihood-Dist", 10000, 0, 5000);
Double_t tempNll = 0;
for (Int_t i = 0; i < 5000; i++)
{
RooDataHist *toyMCDataGen = pdfMc.generateBinned(x, mcEventCount);
nllMc = pdfMc.createNLL(*toyMCDataGen);
tempNll = nllMc->getVal();
if (TMath::Finite(tempNll) == 1)
{
ts->Fill(tempNll);
}
toyMCDataGen->Clear();
}
I have some questions regarding this approach:
- Is it possible to do a single model likelihood test like this? Normally the likelihood is used in a likelihood ratio test.
- I get a nllData value that is higher than the MC nll distribution (out of the range entirely) which suggests P=0 but I know this is not the case from the Kolmogorov test. How should I be normalising the histograms?
- The data event count is not quite equal to the MC event count… This will affect the likelihood value which should give an indication of compatibility. Is this statement correct?
- The area under the MC nll distribution on the right hand side of the data nll value should give the P-Value. Assuming that the data nll value falls within the MC nll distribution.
Any clarification or help would be appreciated!
Regards,
Rob