Large Log likelihood from RooMultiVarGaussian

Da_Yu_Tou · November 22, 2019, 9:26am

Good day,

We are applying correlated constraints in our analysis using RooMultiVarGaussian. These constrained variables are efficiencies. Their values and the covariance matrix elements are less than 1. During minimisation, we get the message:

[#0] WARNING:Eval -- RooAbsPdf::getLogVal(constraintPDF) WARNING: large likelihood value: 1.38885e+17

Mathematically I understand these comes from the determinant of the covariance matrix used in the normalisation and there is nothing to worry about. However, I have a computation side question. These normalisation from RooMultiVarGaussian are large and are constant. Would this affect the minimisation due to floating point roundoff error? This is because relative to this large constant normalisation the change in your likelihood at each step might be small.

One idea we have in our analysis group is that we write a derived class from RooMultiVarGaussian but redefine the virtual RooAbsPdf::getLogVal() function to not include the normalisation coming from the determinant. Is this mathematically and computationally safe when performing the minimisation?

I would like to know if we should worry about this error message and if our suggestion to inherit from RooMultiVarGaussian but redefine getLogVal() would produce reliable results in our analysis.

Da_Yu_Tou · November 22, 2019, 9:28am

Here is a minimal script producing this warning
testMultiVarGaussianSmallCovariance.C (4.7 KB)

Da_Yu_Tou · November 22, 2019, 9:28am

And a similar script but with the constrained values scaled by a factor of 10^4. This does not produce the warning.testMultiVarGaussianLargeCovariance.C (4.7 KB)

Axel · November 25, 2019, 9:04am

We’ll need our RooFit expert @StephanH back from his trip to answer your question. He should be back next week; please ping us if you didn’t get an answer by end of next week!
Axel.

Da_Yu_Tou · November 25, 2019, 9:28am

Thanks for the update. Will do!

StephanH · December 2, 2019, 11:23am

Hello @Da_Yu_Tou,

it depends what kind of tests you want to run. If you use likelihood ratios, you can obviously just leave out a constant in both likelihoods, and get the same result. If you depend on the “proper” value of the likelihood, it doesn’t work.
The fit will converge to the same minimum if the factor that you leave out does not depend on the parameters which are accessible to the fitter.

I didn’t have enough time to go through the math in the MultiVarGaussian, but I’m indeed surprised by the large likelihood value. After all, the likelihood should be a probability given some data, and that number doesn’t look like one …
Did you check some basic things like whether the class expects a true covariance matrix (I mean sigma^2 on the diagonal) or whether it expects a matrix of correlation coefficients?
I also wouldn’t exclude that the integral is wrong. Could you try to evaluate the MultiVarGaussian both with and without normalisation? What’s the difference between

multiVarGauss.getVal();
multiVarGauss.getVal(RooArgSet(x)); // Or whatever the observables are in your case

Da_Yu_Tou · December 5, 2019, 2:12am

Hi @StephanH. Thanks for looking at this.

It is supposed to use the covariance matrix (covMatrix) according to the doxygen page. Also, imagine an uncorrelated multi-variate Gaussian, i.e. off-diagonals are zero. You invert the covariance matrix and get a diagonal matrix with the reciprocal of the variances. This decomposes into independent univariate Gaussian (which constraints each variable independently).

Here is a script to test your suggestion : testNormalisation.C (2.9 KB)

To save you the trouble of running it:

With normalisation 	     :1.40595e+17
Without normalisation 	 :1

Test log with normalisation 	 :[#0] WARNING:Eval -- RooAbsPdf::getLogVal(constraintPDF) WARNING: large likelihood value: 1.40595e+17
39.4847
Test log without normalisation 	 :0

StephanH · December 5, 2019, 9:47am

Ok, it’s indeed the normalisation step. Maybe @moneta has an idea?

StephanH · December 10, 2019, 10:17am

Hi,

we had another look, and our bet is on the small constraints. As you know, the determinant is required to normalise, but your matrix is:

---------------------------------------------------------
   0 |  8.997e-10           0           0           0 
   1 |          0   1.758e-10           0           0 
   2 |          0           0    2.45e-10           0 
   3 |          0           0           0   8.378e-10

That means that the determinant is about 1.E-40, and that’s used to normalise the constraint term. That’s indeed a bit small, but it’s not wrong.
One thing that will make the numbers nicer is to rescale the sigmas to higher values, such that you can use less violent constraint terms in the order of 1.E-3 or so.

Da_Yu_Tou · December 13, 2019, 8:59am

Thanks for taking a closer look at this. I agree with your hunch that the small constraints are causing these warnings. However, the small constraints here are similar in magnitude to the efficiency constraints we use in our analysis. These efficiencies can be re-scaled but it is a hassle and potentially error-prone.

Instead, I tried a class derived from RooMultiVarGaussian which re-implements getLogVal():

    Double_t getLogVal (const RooArgSet *nset=0) const { return log(getVal(nset)); }

It produced the same results as RooMultiVarGaussian but the warning has now disappeared. For some reason, the likelihoods are the same even with RooFit::Offset(kFALSE).

You can find the 2 scripts here:
testMultiVarGaussianSmallCovariance.C (4.7 KB) testNoNormSmallCovariance.C (5.4 KB)

We rely on the likelihood to fit our invariant masses and perform likelihood profilings for the parameters of interest. Hence, I don’t foresee problems from removing a constant term in the likelihood. I would like your opinion if it is safe to use the class inherited from RooMultiVarGaussian as shown in testNoNormSmallCovariance.C to suppress this warning.

StephanH · December 13, 2019, 9:54am

Hi @Da_Yu_Tou,

by overriding getLogVal in the way you did it, you are actually computing the log of the correctly normalised probability. You are simply skipping the step where the checks run that trigger the warning and error handling. That’s why you see exactly the same likelihood as with vanilla RooFit.
At this moment, you didn’t remove any constant from the log-likelihood. If things look numerically stable, you can proceed.

Offset(false) is actually the default. Only if the fitter doesn’t like what it sees, you can add Offset(true) to make the numbers a bit more digestable to the fitter.

system · December 27, 2019, 9:54am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.