Obtaining propagated errors in a ratio including observed data

ellkay · September 21, 2020, 12:41pm

Dear experts,

I am attempting to calculate a ratio of data from two processes which will take the following form (for example):

ratio(Zee/Zmm) = (data_CR2e - sum(non Zee)MC_CR2e) / (data_CR2m - sum(non Zmm)MC_CR2m)
where CR2e and CR2m represent regions enriched in Zee and Zmm, respectively

where I have various samples:

sampleList = ['Znunu', 'Wmunu', 'Wenu', 'Wtaunu', 'Zmumu', 'Zee', 'Ztautau', 'ttbar', 'singletop', 'diboson', 'NCB', 'multijet','Observed']

so e.g. the numerator here would be:
data in CR2e - (Znunu in CR2e + Wenu in CR2e + diboson in CR2e…etc.)

I want to calculate this ratio with errors which properly account for all correlations between the uncertainties from the different samples, so I decided to use the ‘GetPropagatedError’ function from Histfitter, in Utils.cxx (https://github.com/dguest/HistFitter/blob/master/src/Utils.cxx), which seems to be a copy of the function from RooAbsReal with the addition of a bug fix.

Unfortunately, I don’t think this is the right tool for my purposes, since I am passing in a combination of errors from the fitted MC and the data. My workflow is as follows:

==========================================================================

First I read in the workspace ‘w’

Then define the results before and after the fit, to get pre-fit and post-fit results, e.g:

resultBeforeFit = w.obj('RooExpandedFitResult_beforeFit')

I load a pre-fit or post-fit snapshot, e.g:

w.loadSnapshot('snapshot_paramsVals_RooExpandedFitResult_beforeFit')

Load the data set:

data_set = w.data('obsData')

Here is where I already start to be unsure if I am doing the right thing - I extract the data in the given region, then convert it to a RooRealVar with error = sqrt(n), in order for it to be compatible with the RooFormulaVar which I will perform later:

dataInRegionNum = data_set.reduce("channelCat==channelCat::CR2e_cuts")
dataInRegionDen = data_set.reduce("channelCat==channelCat::CR2m_cuts")
dataInRegionNum = RooRealVar("data_CR2e", "data_CR2e",dataInRegionNum.sumEntries())  
dataInRegionNum.setError(math.sqrt(dataInRegionNum.getVal()))
dataInRegionDen = RooRealVar("data_CR2m", "data_CR2m", dataInRegionDen.sumEntries() 
dataInRegionDen.setError(math.sqrt(dataInRegionDen.getVal()))

Then I obtain the sums of the MC contributions for numerator and denominator, by extracting the PDFs in each region and performing a RooAddition for each, e.g. for the numerator:

nonMCNum, nonMCDen = {}, {}
numlist = RooArgSet()
denlist = RooArgSet()
for sample in [ s for s in sampleList if s!= "Zee" ] :
	nonMCNum[sample] = getPdfInRegions(w, sample, "CR2e")
    if not nonMCNum[sample]:
    	print "Skipping", sample, "num"
	    continue
    numlist.add(nonMCNum[sample])
	numvaltmp = nonMCNum[sample].getVal()
	print "added",sample,"num", numvaltmp
sumNum = RooAddition( "non_Zee_MC", "non_Zee_MC", RooArgList(numlist))

Finally, I calculate the ratio with a RooFormulaVar:

form  = R.RooFormulaVar("dataRatio", "(@0-@1)/(@2-@3)", RooArgList(dataInRegionNum, sumNum, dataInRegionDen, sumDen) )

Looking at the ratio value, I get the expected result using:

val = form.getVal()

However, I then try to get the propagated error:

if prefit is True:
	err = Util.GetPropagatedError(form, resultBeforeFit, doAsym)
else:
	err = Util.GetPropagatedError(form, resultAfterFit, doAsym)

And the result is zero…
In another part of the code I perform a simple MC/MC ratio, where this functionality works fine, so I am suspicious of my inclusion of the data in this

================================================
Is it naive to think there is an existing functionality which will calculate such errors while accounting for quantities which exist outside of the fit?
Either way, I would appreciate some advice to push me in the right direction…

StephanH · September 21, 2020, 1:21pm

Hello @ellkay,

Using the data in that computation might indeed be causing trouble. You seem to somehow abuse the data as variables, but data in RooFit are a fixed entity, immutable.
Note that what you did should be equivalent to:

N_a = data_set.sumEntries("channelCat==channelCat::CR2e_cuts")
N_b = data_set.sumEntries("channelCat==channelCat::CR2m_cuts")

form  = R.RooFormulaVar("dataRatio", "({0}-@0)/({1}-@1)".format(N_a, N_b), RooArgList(sumNum, sumD))

You see, the number of data events are just constants in the formula.

Now, what you want is error propagation using some post-fit parameters. I don’t see parameters, though. getPropagatedError() is meant to be used for models that have been fit to data, so a covariance matrix of the parameters can be used to estimate the errors. I guess in general you cannot write down a new model, and hope that the parameters have the same meaning (= covariance matrix) such that error propagation can be used.

What you need is a fit model, fit it to data, and then I guess you need to integrate the PDFs over the categories where you measure the number of data events, unless you have an extended PDF. In that case, the number of events in a category is a parameter of the fit, and you might be able to apply linear error propagation using that function.
Note that if that somehow succeeds, you still don’t have any data error. If the parameters don’t depend on the amount of data, you can add this by doing the “normal” error propagation (e.g. on paper). If they do, I would recommend to run a big bunch of toy fits, and measure the error empirically.

StephanH · September 21, 2020, 1:22pm

Would you know what the suspected bug in RooFit is? ROOT never received a bug report mentioning this function.

StephanH · September 21, 2020, 1:22pm

Maybe @moneta has an idea?

moneta · September 23, 2020, 9:02am

Hi ,
As Stephan mentioned, getPropagateError1 requires the parameter uncertainties and their correlation which are obtained after a fit. It does not make sense to call this function before a fit, as you are doing.

Furthermore, this is a crude approximation, that works only in case you can apply error propagation. This is often NOT the case when you have ratio’s of quantities.

If you need to estimate a ratio with correct uncertainties, bacuse it is a physical parameters, you either need to have a full model and use the ratio as fit parameter, or otherwise obtained again as a derived parameter, but use something else, like toys for estimating the uncertainty

Lorenzo

system · October 7, 2020, 9:02am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.