Dear experts,
I’d like to perform a statistical analysis based on the usual binned likelihood with Poissonian event counts. HistFactory seems to be the right tool to create the workspace, so I’m giving it a try. I started with a maximum-likelihood fit in a very simple toy setup, but I’m getting unexpected results out of it. Could you please have a look?
My toy setup is as follows:
- A single observable with three bins of varying width. Bin edges are at 0, 1, 3, 6.
- A single background with an expectation of 1 event per unit of luminosity in each bin.
- Signal predicts 0.1 events per unit of luminosity in the first bin and no events elsewhere.
- There are no systematic uncertainties.
- Data are (manually) constructed as an Asimov data set for a luminosity of 10 and signal strength 2.
Input histograms are constructed with the following Python script (in case somebody wants to play with the setup):
from array import array
import ROOT
outputFile = ROOT.TFile('templates.root', 'recreate')
binning = array('d', [0., 1., 3., 6.])
histSgn = ROOT.TH1D('Sgn', '', len(binning) - 1, binning)
histBkg = histSgn.Clone('Bkg')
histSgn.SetBinContent(1, 0.1)
for bin in range(1, histBkg.GetNbinsX() + 1):
histBkg.SetBinContent(bin, 1.)
intLumi = 10.
r = 2.
histData = histSgn.Clone('Data')
histData.Scale(r)
histData.Add(histBkg)
histData.Scale(intLumi)
outputFile.Write()
outputFile.Close()
Below are the XML files for HistFactory:
<!-- toy.xml --->
<!DOCTYPE Combination SYSTEM "HistFactorySchema.dtd">
<Combination OutputFilePrefix="./results/toy">
<Input>./toy_channel.xml</Input>
<Measurement Name="measurement" Lumi="10" LumiRelErr="0.01" ExportOnly="True">
<POI>r</POI>
<ParamSetting Const="True">Lumi</ParamSetting>
</Measurement>
</Combination>
<!-- toy_channel.xml -->
<!DOCTYPE Channel SYSTEM "HistFactorySchema.dtd">
<Channel Name="channel" InputFile="./templates.root">
<Data HistoName="Data" />
<Sample Name="Sgn" HistoName="Sgn" NormalizeByTheory="True">
<NormFactor Name="r" Val="1." Low="0." High="10." />
</Sample>
<Sample Name="Bkg" HistoName="Bkg" NormalizeByTheory="True">
</Sample>
</Channel>
I construct the workspace with hist2workspace toy.xml
and then fit the data using the following script:
import ROOT
inputFile = ROOT.TFile('results/toy_combined_measurement_model.root')
workspace = inputFile.Get('combined')
pdf = workspace.obj('ModelConfig').GetPdf()
data = workspace.data('obsData')
pdf.fitTo(data, ROOT.RooFit.Minimizer('Minuit2'))
The expected fitted value of r should be 2, but instead the fit reports 10, which is the upper boundary of the allowed range for r. Interestingly, with the automatically generated Asimov data set, I fit r = 1, as expected.
I tried investigating this a bit further and inspected the two data sets stored in the workspace. These are the weights I read from them:
obs_x_channel | obsData | asimovData |
---|---|---|
0.5 | 12 | 5.5 |
2.0 | 10 | 10 |
4.5 | 10 | 15 |
The weights for the ‘observed’ data are what I would expect (bin contents of the corresponding histogram), but I’m puzzled by the weights for the Asimov data set. It looks like they have been multiplied by half of the corresponding bin width. I also found this old post, which, very alarmingly, suggests that HistFactory does not handle histograms with non-uniform binning, but there was no follow up.
Is what I’m trying to do supposed to work? Am I missing something?