Non-parametric model

xxiang4 · September 24, 2019, 5:57pm

Dear Experts,

I’m doing a profile likelihood ratio test with RooFit. The full model is an extended likelihood function. The model has 1 signal and several background PDFs (RooHistPdf), and as well as several constraint terms with nuisance parameters. The constraint function is defined with the built-in functions (ex. RooGaussian). However, I am wondering if it’s possible to use RooHistPdf (non-parametric) instead to define the constraint. What’re the correct syntax? Are there any examples I can follow?

Below it’s my current attempt to import the constraint in RooHistPdf to workspace:

 RooRealVar nuis("nuis","nuis", 0., 10); w->import(nuis);

// ....

//Get TH1D from root, and construct RooHistPdf                                                                              
TFile* fConstraint = TFile::Open("test.root","read");
TH1D* hConstraint = (TH1D*) fConstraint->Get("hh");
RooDataHist ConstraintHist("ConstraintHist","ConstraintHist", RooArgSet(nuis), hConstraint);
RooHistPdf ConstraintPdf("ConstraintPdf","ConstraintPdf", RooArgSet(nuis), ConstraintHist);
w->import(ConstraintPdf);

StephanH · October 7, 2019, 3:56pm

Hello @xxiang4,

I haven’t understood yet what you need to do. I understand a constraint as a likelihood function that has a parameter, which constrains it to a certain value.
Example:
The parameter alpha is distributed as a Gaussian around zero with sigma of 1. That means that alpha is in the interval [-1, 1] with a probability of 68%, [-2, 2] with 95%, etc.

I don’t see how a histogram can serve as a constraint, so maybe we need to clarify if we are talking about the same thing when we say “constraint”.

xxiang4 · October 7, 2019, 5:11pm

Hi @StephanH

Thank you for the reply. In your example, the constraint function is a Gaussian centered at 0 with sigma of 1 (w.factory("Gaussian::gauss(x[-10, 10],mean[0],width[1])");). Here the nuisance parameter is x. I understand how to implement that.

In principle, a histogram can serve as a constraint function as well. Since the constraint function is essentially just a distribution of a nuisance parameter, I was wondering if I can build the function directly from histogram (TH1). I got the histogram from an independent toy simulation. Of course, one way is to fit the histogram with a function, and use the best fitted parameters to define the constraint function. Instead, I would like to directly use the histogram in RooFit as a constraint function. I don’t know if this is possible in RooFit framework.

Let me know if it makes sense.

StephanH · October 8, 2019, 9:37am

Ok, one thing thats confusing is the role of x. Usually, x is an observable, i.e. something you measured in data. Your fit needs an observable, but it also needs parameters. So let’s call the observable x, and the parameter a. What you need for a constrained fit is data in x, so let’s say a data histogram with the measured values of x, a model (PDF) in x, and a parameter that changes something in the model, a.

The likelihood function would look something like this:
L(x | a) * L(constraint data | a)
Here, the likelihoods are constructed from histogram PDFs, and the first one is the likelihood to find certain values of x given the model and the parameter a of this model. If you want to constrain a, you need a second likelihood in other data, let’s call it y, that also reacts to the parameter a. In this way, a is constrained to a certain range. That is, you need a data histogram with x data, another with y data, and two model PDFs that define one model for x, and the other model for y.
Note that it doesn’t make sense to constrain the observable itself.

xxiang4 · October 8, 2019, 8:34pm

Hi, @StephanH

Sorry for the confusion. I agree that it does not make sense to constrain the observable. I did not specify the full likelihood function. The above Gaussian is a constraint function for a model parameter, which I happened to causally called it x.

The full likelihood function looks more like this:
L(x | a)* L(a)
where x is the observable, a is the nuisance parameter, and L(a) is the constraint data PDF. So L(a) specify the range of a.

The original question I have is, how to implement L(a) using a histogram? Are there any examples implementing the constraint function without assigning analytic form to it (ex. without setting L(a) = Gaussian(a; μ_a, σ_a) ). The range of a can be specified by the histogram in principle.

StephanH · October 9, 2019, 12:43pm

I guess

RooRealVar a(...);
RooDataHist dataForA(..., ..., TH1 for a);
RooHistPdf constraintForA(..., ..., a, dataForA);
RooProdPdf constrainedModel(..., ..., LH, constraintForA);

should do the job. You literally wrote down this formula above, I only translated that into a “HistPdf”. LH here is the unconstrained one, which also depends on a.

system · October 23, 2019, 12:43pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.