Fitting using constraints

Hello,

I was using Minuit to minimize a function, but now I need to add a constraint to this minimization.

I tried adding the constraint using a lagrange multiplier of the type:

Chi2 - L*constraint = MinimizationFunction

but Minuit returns infinte. Is there a way to implement these constraints in a minimization?

Thank you,

Belen

Dear Belen,

Unfortunately, I won’t bring you any answer to this, but I was wondering if, eventually, you found a way how to solve this / explain this, because I am experiencing the same problem with a -2log(likelihood) fit at the moment and help would be very welcomed…

Thank you very much,
Geraldine

Hi,

If you implement a ‘hard’ constraint (e.g. requiring p = 2*q) you will not have much luck with Minuit is you add that through a L multiplier as numeric algorithms don’t deal well with such formulations.

In roofit, there are multiple options to deal with constraints in fits.
For soft constraints (e.g. a p.d.f. on a parameters) you can specify these p.d.f and have them applied in the minimization phase (see e.g.
root.cern.ch/root/html/tutorials … nts.C.html)

For hard constraints, you can use implement these by simply re-expressing one of the parameters in terms of others. You can do this
a posteriori, e.g.

// Make a workspace with a Gaussian p.d.f G(x,m,s)
RooWorkspace* w = new RooWorkspace("w",1) ;
w->factory(Gaussian::gx(x[-10,10],m[-10,10],s[1.0,0.1,3.0])) ;

// Make a new pdf Gc(x,m,3+m) ;
w->factory("EDIT::gx_c(gx,s=expr('3+m',m)) ;

The above substitution is nonsensical, but the idea is clear I hope.

Wouter

Hi,

[quote=“gconti”]Unfortunately, I won’t bring you any answer to this, but I was wondering if, eventually, you found a way how to solve this / explain this, because I am experiencing the same problem with a -2log(likelihood) fit at the moment and help would be very welcomed…
[/quote]

If you are minimizing -2 * ln (likelihood), you can add (for each constraint)

+1 * ((x - meanx)/ sigmax)^2

to your function to implement Gaussian constraints.

Note that factors of -2 (in front of the ln (likelihood)) and -1/2 from the Gaussian cancel so the factor is indeed +1.

Hope this helps,
Charles

Attached is likelihood function for binned template fit with Gaussian constraints.

Hi

Just for completeness. The way this is done in RooFit is that
you specify a constraint p.d.f. on a parameter, which gets
converted to a term in the -log(L), e.g. one would do

RooAbsPdf* mypdf ;
RooGaussian constr_p(“cp”,“cp”,p,RooConst(5.3),RooConst(0.1)) ;
RooProdPdf pdf_c(“pdf_c”,“pdf_c”,RooArgSet(*mypdf,constr_p) ;

pdf_c.fitTo(data,Constrain§) ; // Request inclusion of constraint(s) on p

This allows you to use any RooAbsPdf shape as constraint, although RooGaussian will be the most frequently used case, and will translate
to the term you show.

Another particularly useful non-trivial case are multivariate Gaussian
constraints with correlations, which can be used trivially in RooFit.
P.d.f. class RooMultiVarGaussian implements multi-variate Gaussians with correlations and you can e.g. use RooAbsPdf* RooFitResult::getHessePdf() to obtain a multi-variate Gaussian probability density function in the parameters of a fit.

Wouter

Dear Charles and Wouter,

Thank you both very much for your nice and quick answers. I have some more questions about this, if you allow me.

In my problem, I have a very complicate PDF with several parameters. If I let them all free, it is not possible to determine them all properly.

I noticed that one of the parameters was mostly responsible for the bad output distributions of the parameters. What I tried first was therefore to fix this parameter. The output distributions for all the other parameters (errors and pull included) were indeed ok afterwards.

As a second step, I tried to apply the very nice idea of a Gaussian constraint on this parameter, to let some freedom on it, and this worked really fine :slight_smile: ! In this case, I just have a problem of how to interpret the error and pull distributions that I obtain for the constrained parameter, Indeed, they do not seem to have any meaning anymore, the error mean not being equal anymore to the width of the parameter distribution and the width of the pull distribution being very small compared to 1. In other words, when using a Gaussian constraint on one parameter, how can we then give an error on this parameter (if this is possible to determine an error on it, what I am not sure about…) ? And about the other parameters, how can I determine how their errors have been affected by the Gaussian constraint (as they are +/- correlated with the constrained parameter) ?

As a third step, I would like to try to reparameterize my pdf, to see if I can determine all the parameters without any external information about the value of one parameter (what would be indeed the “ideal” case). For this, I decided to introduce a new parameter c, which represents a term in the pdf that is a function of two other parameters a and b (c=fct(a,b)). These two parameters a and b are still kept as parameters in the fit, as they also appear individually in other parts of the pdf. To be consistent with the initial expression of the pdf, I have to add a constraint term to link this new parameter c with a and b. Therefore, I have added a constraint function multiplied by a Lagrange multiplier as :

F=-2ln(Likelihood)+lambda*(g(a,b,c)=0)

I therefore tried to follow the example : roofit.sourceforge.net/docs/clas … n2.cc.html

At the moment, I haven’t managed to have any fit converging.

In the example above, alpha is fixed, while in my case, lambda is not known “a priori”, so I have to let it float in the fit. However, I am not sure about the value I should give as input (and which error size). I have tried 0 and -1 and input values. Could this be part of my problem and is there any way of determining a reasonable initial value for lambda ?

In the output of the fits, HESSE is also complaining that the second derivative of lambda is zero. This is indeed always the case for a Lagrange multiplier, so I don’t know what I can exactly do for this :

MINUIT WARNING IN HESSE
============== Second derivative zero for parameter8
MNHESS FAILS AND WILL RETURN DIAGONAL MATRIX.
FCN=-0.1 FROM HESSE STATUS=FAILED 21 CALLS 100 TOTAL
EDM=50.7279 STRATEGY= 2 ERROR MATRIX UNCERTAINTY 100.0 per cent

I was also wondering if you would have any other suggestions/ideas I could try for this Lagrange multiplier method ?

Thank you very much,
Best Greetings,

Géraldine

Dear all,

I was trying to use RooMultiVarGaussian with RooFit v.2.95, as suggested in one of the previous responses, but I get the following message :

"NameError : name ‘RooMultiVarGaussian’ is not defined. "

Is it not possible to use it with v2.95 ? Is there any similar function in the v2.95 RooFit version I could use ?

Thanks,
Géraldine

Hello,

in your fit with a penalty term you cannot have lambda float, otherwise yor minimum will be for lambda=-inf.
The same is for lagrangian multipliers, you cannot simply use Minuit and minimize the Lagrangian function. You would need to find the stationary point, where the derivatives are zero but not the minimum.

I would try top use a constraint minimization algorithm (unfortunately we don’t have one in ROOT, but we could add one) or a multi-dimensional root finding method for solving the derivative equation. There are some solvers in GSL:
gnu.org/software/gsl/manual/ … nding.html

Best Regards

Lorenzo

Hi Geraldine ,

(Sorry for the late reply – I just returned from vacation )

RooMultiVarGaussian does not exist in 2.95, but I think it does not
require any features added recently, so if you wish you should be
able to copy its source code from ROOT 5.24 and compile it
yourself with 2.95.

Wouter

Hi all,

I have a nice solution to this problem that works well,
at least for my case. I wanted to get a model prediction
with errors for a parameter that’s actually derived
from other floating parameters using a RooFormulaVar.
If the parameter of interest were a floating parameter, I would
make a profile likelihood plot and be done. That’s
not an option for a derived parameter, as far as I know.

What I did was to change the variable that Minuit minimizes
to include a constraint term that constrains the
derived parameter (x) to a given value (x0), where
the new Minuit minimimzation variable looks like this

  newMinVar =  NLL + w*(x-x0)*(x-x0)

The extra term effectively forces x to have the value
x0 while minimizing the NLL w.r.t. all floating parameters.
You may need to experiment a bit with the value for w.
I used a quadratic constraint rather than a linear one,
since it’s mathematically the same as a Gaussian constraint
in the likelihood and Minuit has an easier time with it.

To make the profile likelihood plot, I plot NLL (NOT
newMinVar) vs x0, where I scan x0 over a range of
interest. Actually, I use the value of x after
convergence, which is close to x0, but not exactly x0,
since NLL is computed with x, not x0.

I checked that this gives identical results to the
case where I recast the model parameter equations such
that my x parameter was a floating parameter. I have
a lot of parameters I want to check like this and
redoing the equations for each one is a pain and error-
prone. Adding the constraint term is a better solution
for my case.

I think this is probably only useful for making profile
likelihood plots of derived parameters. I would not use
the Minuit errors that come out of this, since they
are probably influenced by the somewhat arbitrary term
added to the Minuit minimization variable.

I attached an example source file that implements what
I describe above. It runs for me, but may not work
for you out of the box.

Owen

PS. If the parameter of interest is the total model
prediction of an observable which has a Poisson PDF
in the likelihood, I think you can get an unbiased model
prediction for the value of that observable by removing
the Poisson PDF for that observable from the likelihood
by using

  newMinVar = NLL + w*(x-x0)*(x-x0) + log(P)

where P is the Poisson PDF for the observable in question.
The last term above cancels the -log§ inside NLL. That’s
assuming your model can make a prediction without the
observable. Mine can. This option is included in my
example code.
roottalk_example.c (8.71 KB)