Numerical instability in HistFactory for large bin contents

Magnar · October 20, 2016, 1:55pm

Dear experts,

In a couple of ATLAS analyses we are using a particular setup based RooStats and HistFactory to compute significances of potential excesses. In one of the analyses, we have for a while been having numerical problems with failing fits. Some investigation and comparison to an independent code led to the suspicion that the problems were caused by very large log-likelihood values (order 10^7) associated with bins with large contents (we have bin contents under the background hypothesis ranging from order 10^5-10^6 down to essentially 0). This suspicion was further strengthened when the other analysis also started to include bins with larger content and saw the same kind of behavior.

While we do have a working alternative in the before mentioned independent code, there is still interest to keep the HistFactory setup going, so we therefore kindly request your assistance to get to the bottom of this and hopefully solve the problem. Being not an expert in RooStats, the information below may be incomplete, so please request any additional information as needed.

---- The code setup -----

The setup is based on HistFactory, and the central ingredient in the setting up of the workspace seems to be the lines

#include <RooStats/HistFactory/MakeModelAndMeasurementsFast.h>
RooWorkspace* ws = MakeModelAndMeasurementFast(meas);

Here, the measurement “meas” is created along the lines of

Measurement meas(name.data(), name.data());
meas.SetPOI("mu");
meas.SetLumi(lumi);	

Channel channel_ee("channel_ee");
channel_ee.SetStatErrorConfig(relStatErrorThreshold, "Poisson");

Sample sample_ee_bkg_dy("sample_ee_bkg_dy", "h_ee_bkg_dy", inputFile);

HistoSys h_elecSys(sysString);
sample_ee_bkg_dy.AddHistoSys(h_elecSys);

channel_ee.AddSample(sample_ee_bkg_dy);
channel_ee.SetData("h_ee_data", inputFile);

meas.AddChannel(channel_ee);

----- Observations ------

Comparisons to the independent code clearly show that the log-likelihood resulting from this setup is not the obvious sum over Poisson log-likelihood values for each bin (which is used in the independent code). The order of magnitude suggests that the Poisson factorial normalization is missing, but further observations (see next paragraph) indicate that it may not be as simple as this.

Having received the tip of looking into the RooNLLVar::evaluatePartition function, we printed some debugging information, and found the following surprising structure:

The boolean “_binnedPdf” is false, even though the inputs we provide are all in histogram form.
The code appears to “pretend” to be doing an unbinned fit where it is looping over the “events” (which appear to be the bins in our histogram), adding some “logPdf” value to the log-likelihood which is multiplied by the “event weight” (which appears to be the bin contents in data). We did not manage to fully understand what is the interpretation of “logPdf” in this context.
Finally, a Poisson term is added in the log-likelihood representing the total normalization of the whole histogram.

It is not clear to us at present whether this calculation is meant to be equivalent to the obvious sum over Poisson log-likelihood values for all bins, or whether the statistical model/interpretation of the data is somehow different in this setup. (In the cases where the fits converge in RooStats, the results seem to be identical to those of the independent code using simple Poisson probabilities).

Furthermore, we would be interested to know how the numerical stability can be improved. It is indeed the case that the factorial normalization is missing from the Poisson term 3), but adding it back in does not help, as the numbers from 2) are already large when bin contents are large.

Best regards,
Magnar K. Bugge

moneta · October 31, 2016, 2:43pm

Hi,

Do you have a concrete workspace showing this numerical instability problem ? I could investigate it starting from this.
Your observations are correct, RooFit handles the Poisson binned log-likelihood fit in a similar way as the extended unbinned fit. The result is almost the same as a standard binned likelihood fit, a part from some small
normalisation differences. It is known this differences might cause a bias in case of function highly not linear in the bin. Unfortunately RooFIt cannot handle the integral of the function in the bin.
It is possible that this different formulation can cause as well numerical problems, for this if you have an example workspace, will be easier for me to investigate it

Best Regards

Lorenzo

Magnar · November 7, 2016, 11:09am

Dear Lorenzo,

Thanks a lot for offering to look into this! There was of course the obvious confidentiality issues with providing a real example workspace, so the problem went a bit around ATLAS first, and the solution was found. I post it here for your information and to make the thread a useful reference.

In fact, it seems to be possible in this context to tell RooStats to use the “real” binned likelihood, and this was simply achieved by adding these lines after creating the workspace, here called “ws”:

RooFIter iter = ws->components().fwdIterator();
RooAbsArg* arg;
while ((arg = iter.next())) {
  if (arg->IsA() == RooRealSumPdf::Class()) {
    arg->setAttribute("BinnedLikelihood");
    cout << "Activating binned likelihood attribute for " << arg->GetName() << endl;
  }
}

This has been validated for one test case, and RooStats is in this case able to reproduce the results of the independent code. Also, the log-likelihood values are found to be of an order of magnitude which fits with the proper binned Poisson likelihood.

Cheers,
Magnar

moneta · November 8, 2016, 9:02am

Dear Magnar,

Thank you for posting a solution for this problem. Let me understand it better, by forcing the use of a Binned Likelihood fit for the RooRealSumPdf resulting from the HIstFactory, are you avoiding the numerical problems observed in computing the log-likelihood function ?

Best Regards

Lorenzo

Magnar · November 10, 2016, 8:48am

Dear Lorenzo,

Yes, absolutely. At least this was the case for the test case from one of the searches. I am still waiting for feedback from the other search team, and we can still do more testing with another channel from my search and also combining the channels. On the other hand, there was definitely no doubt that the stability was vastly improved, from almost all fits failing in the “unbinned” case to all of them converging to the same results as the independent code when the binned Poisson likelihood was employed.

I am not too surprised by this: Consider for example the contributions of systematic uncertainty constraint terms of order

loglikelihood = - theta^2 / 2 (theta of order unity at most)

to a log-likelihood which is already of the order 10^7. There could easily be 9 orders of magnitude between the different numbers that are added, and Minuit has to be sensitive to changes at that level in order to take into account all information.

Given that it is possible to set the binned likelihood like this, I wonder whether it should not be the default?

Cheers,
Magnar