Binned ML fit with HistFactory

apopov · May 1, 2018, 10:46am

Dear experts,

I’d like to perform a statistical analysis based on the usual binned likelihood with Poissonian event counts. HistFactory seems to be the right tool to create the workspace, so I’m giving it a try. I started with a maximum-likelihood fit in a very simple toy setup, but I’m getting unexpected results out of it. Could you please have a look?

My toy setup is as follows:

A single observable with three bins of varying width. Bin edges are at 0, 1, 3, 6.
A single background with an expectation of 1 event per unit of luminosity in each bin.
Signal predicts 0.1 events per unit of luminosity in the first bin and no events elsewhere.
There are no systematic uncertainties.
Data are (manually) constructed as an Asimov data set for a luminosity of 10 and signal strength 2.

Input histograms are constructed with the following Python script (in case somebody wants to play with the setup):

from array import array
import ROOT

outputFile = ROOT.TFile('templates.root', 'recreate')

binning = array('d', [0., 1., 3., 6.])
histSgn = ROOT.TH1D('Sgn', '', len(binning) - 1, binning)
histBkg = histSgn.Clone('Bkg')

histSgn.SetBinContent(1, 0.1)

for bin in range(1, histBkg.GetNbinsX() + 1):
    histBkg.SetBinContent(bin, 1.)

intLumi = 10.
r = 2.

histData = histSgn.Clone('Data')
histData.Scale(r)
histData.Add(histBkg)
histData.Scale(intLumi)

outputFile.Write()
outputFile.Close()

Below are the XML files for HistFactory:

<!-- toy.xml --->
<!DOCTYPE Combination SYSTEM "HistFactorySchema.dtd">
<Combination OutputFilePrefix="./results/toy">
    <Input>./toy_channel.xml</Input>
    <Measurement Name="measurement" Lumi="10" LumiRelErr="0.01" ExportOnly="True">
        <POI>r</POI>
        <ParamSetting Const="True">Lumi</ParamSetting>
    </Measurement>
</Combination>

<!-- toy_channel.xml -->
<!DOCTYPE Channel SYSTEM "HistFactorySchema.dtd">
<Channel Name="channel" InputFile="./templates.root">
    <Data HistoName="Data" />
    <Sample Name="Sgn" HistoName="Sgn" NormalizeByTheory="True">
        <NormFactor Name="r" Val="1." Low="0." High="10." />
    </Sample>
    <Sample Name="Bkg" HistoName="Bkg" NormalizeByTheory="True">
    </Sample>
</Channel>

I construct the workspace with hist2workspace toy.xml and then fit the data using the following script:

import ROOT

inputFile = ROOT.TFile('results/toy_combined_measurement_model.root')
workspace = inputFile.Get('combined')

pdf = workspace.obj('ModelConfig').GetPdf()
data = workspace.data('obsData')

pdf.fitTo(data, ROOT.RooFit.Minimizer('Minuit2'))

The expected fitted value of r should be 2, but instead the fit reports 10, which is the upper boundary of the allowed range for r. Interestingly, with the automatically generated Asimov data set, I fit r = 1, as expected.

I tried investigating this a bit further and inspected the two data sets stored in the workspace. These are the weights I read from them:

obs_x_channel	obsData	asimovData
0.5	12	5.5
2.0	10	10
4.5	10	15

The weights for the ‘observed’ data are what I would expect (bin contents of the corresponding histogram), but I’m puzzled by the weights for the Asimov data set. It looks like they have been multiplied by half of the corresponding bin width. I also found this old post, which, very alarmingly, suggests that HistFactory does not handle histograms with non-uniform binning, but there was no follow up.

Is what I’m trying to do supposed to work? Am I missing something?

apopov · May 1, 2018, 5:38pm

Indeed, this seems to be a problem with the non-uniform binning. At least when I use histograms with a uniform binning, the fit gives r = 2. I also found that this behaviour was rightly reported as a bug some five years ago, but there was no response.

I would appreciate if others could educate me on whether there are other problems with HistFactory I should be aware of. Or if there are alternatives short of building the workspace by hand.

system · May 15, 2018, 5:38pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.