Dear experts,
I am attempting to calculate a ratio of data from two processes which will take the following form (for example):
ratio(Zee/Zmm) = (data_CR2e - sum(non Zee)MC_CR2e) / (data_CR2m - sum(non Zmm)MC_CR2m)
where CR2e and CR2m represent regions enriched in Zee and Zmm, respectively
where I have various samples:
sampleList = ['Znunu', 'Wmunu', 'Wenu', 'Wtaunu', 'Zmumu', 'Zee', 'Ztautau', 'ttbar', 'singletop', 'diboson', 'NCB', 'multijet','Observed']
so e.g. the numerator here would be:
data in CR2e - (Znunu in CR2e + Wenu in CR2e + diboson in CR2e…etc.)
I want to calculate this ratio with errors which properly account for all correlations between the uncertainties from the different samples, so I decided to use the ‘GetPropagatedError’ function from Histfitter, in Utils.cxx (https://github.com/dguest/HistFitter/blob/master/src/Utils.cxx), which seems to be a copy of the function from RooAbsReal with the addition of a bug fix.
Unfortunately, I don’t think this is the right tool for my purposes, since I am passing in a combination of errors from the fitted MC and the data. My workflow is as follows:
==========================================================================
First I read in the workspace ‘w’
Then define the results before and after the fit, to get pre-fit and post-fit results, e.g:
resultBeforeFit = w.obj('RooExpandedFitResult_beforeFit')
I load a pre-fit or post-fit snapshot, e.g:
w.loadSnapshot('snapshot_paramsVals_RooExpandedFitResult_beforeFit')
Load the data set:
data_set = w.data('obsData')
Here is where I already start to be unsure if I am doing the right thing - I extract the data in the given region, then convert it to a RooRealVar with error = sqrt(n), in order for it to be compatible with the RooFormulaVar which I will perform later:
dataInRegionNum = data_set.reduce("channelCat==channelCat::CR2e_cuts")
dataInRegionDen = data_set.reduce("channelCat==channelCat::CR2m_cuts")
dataInRegionNum = RooRealVar("data_CR2e", "data_CR2e",dataInRegionNum.sumEntries())
dataInRegionNum.setError(math.sqrt(dataInRegionNum.getVal()))
dataInRegionDen = RooRealVar("data_CR2m", "data_CR2m", dataInRegionDen.sumEntries()
dataInRegionDen.setError(math.sqrt(dataInRegionDen.getVal()))
Then I obtain the sums of the MC contributions for numerator and denominator, by extracting the PDFs in each region and performing a RooAddition for each, e.g. for the numerator:
nonMCNum, nonMCDen = {}, {}
numlist = RooArgSet()
denlist = RooArgSet()
for sample in [ s for s in sampleList if s!= "Zee" ] :
nonMCNum[sample] = getPdfInRegions(w, sample, "CR2e")
if not nonMCNum[sample]:
print "Skipping", sample, "num"
continue
numlist.add(nonMCNum[sample])
numvaltmp = nonMCNum[sample].getVal()
print "added",sample,"num", numvaltmp
sumNum = RooAddition( "non_Zee_MC", "non_Zee_MC", RooArgList(numlist))
Finally, I calculate the ratio with a RooFormulaVar:
form = R.RooFormulaVar("dataRatio", "(@0-@1)/(@2-@3)", RooArgList(dataInRegionNum, sumNum, dataInRegionDen, sumDen) )
Looking at the ratio value, I get the expected result using:
val = form.getVal()
However, I then try to get the propagated error:
if prefit is True:
err = Util.GetPropagatedError(form, resultBeforeFit, doAsym)
else:
err = Util.GetPropagatedError(form, resultAfterFit, doAsym)
And the result is zero…
In another part of the code I perform a simple MC/MC ratio, where this functionality works fine, so I am suspicious of my inclusion of the data in this
================================================
Is it naive to think there is an existing functionality which will calculate such errors while accounting for quantities which exist outside of the fit?
Either way, I would appreciate some advice to push me in the right direction…