Randomisation of Nuisance Parameters In ToyMCSampler.cxx

draggeddown · October 11, 2020, 8:42am

Hey! I’m new here, so pardon me if this is a relatively stupid question.

Basically I have been trying to use ToyMCSampler in conjunction with ProfileLikelihoodTestStat to try and generate sampling distributions for my test statistic (the ultimate aim being to try and compare the results with Wilk’s theorem and the asymptotic approximation).

I was a little unclear about how and where exactly in the ToyMCSampler.cxx code (link here : https://root.cern/doc/master/ToyMCSampler_8cxx_source.html ) the nuisance parameters from my workspace are being randomised (while the toy data is being generated of course).

In line 631 you finally see the pdf.generate() command; does that automatically somehow randomise the nuisance parameters (other than the parameter of interest of course) while generating the data? As far as I know, for each generated dataset, the nominal values (or Global Observables) are set to values gotten from the workspace, while the nuisance parameters are randomised. So where exactly does this happen in the code? More importantly how does it happen? As in, does it take into account the constraint terms for these nuisance parameters that I’ve defined in my workspace?

This could be a stupid question, but I would appreciate any pointers/help.

Thanks in advance!

eguiraud · October 12, 2020, 2:15pm

Hi @draggeddown,
and welcome to the ROOT forum! I think we need the help of @moneta or @StephanH for this one.

Cheers,
Enrico

moneta · October 12, 2020, 2:36pm

Hi,
The randomisation happens only if you are using the HybridCalculator, by using the function ToyMCSampler::SetPriorNuisance.
For the frequentist case, FrequentistCalculator there is no randomisation of parameters
(parameters are given in frequentist statistics and if they have to be randomised they would need a prior pdf).
In the frequentist case there is however a profile fit to set the nuisance parameters before sampling.
This happens in FrequentistCalculator::PreNullHook and PreAltHook

Lorenzo

draggeddown · October 12, 2020, 2:50pm

Aah thanks! Also, when using the ToyMCSampler in conjunction with the ProfileLikelihoodTestStat, how do I access the fitted parameter-of-interest values for each evaluation of the test statistic? I know it probably has something to do with the detailed output option in PLTS, but I’m not entirely sure.

moneta · October 12, 2020, 3:04pm

Hi,

Yes you need to enable the detailed output option and you can have a TTree with the fitter parameter values. Otherwise in vetoes option you can get these POI fitted values printed in the screen

Lorenzo

draggeddown · October 15, 2020, 7:20am

Hey!

Thanks again for your timely reply. Sorry for the follow-up questions, but I’m immensely confused at the moment about the randomisation of parameters in ToyMCSampler.

Here is the link again to the source code : https://root.cern/doc/master/ToyMCSampler_8cxx_source.html
(The function I am talking about starts at line 546)

So in my understanding of the code, if one looks at ToyMCSampler::GenerateToyData(paramPoint), then one sees that
a) A variable ‘allVars’ is created that contains all the variables in the PDF, and this includes the nuisance parameters, the parameters of interest, the global observables AND the observables.
b) The global variables are randomised explicitly by using GenerateGlobalObservables()
c) The NuisanceParameterSampler is created once at the beginning, and at that point the Refresh() function randomises the nuisance parameters once.
d)Then the argument ‘paramPoint’ is basically held constant, while all other parameters are randomised by calling the NextPoint() function.

Now my problem is:

What is the RooStats recommendation about how one should randomise the parameters when the end goal is to generate toy data? As in, what should be randomised and what shouldn’t (I referred to a tutorial : https://root.cern/doc/master/StandardTestStatDistributionDemo_8C.html , but there is no fPrior here, so there is no randomisation)
Why are the global observables explicitly randomised by using the GenerateGlobalObservables() function, when this is already done by using the NextPoint() function a few steps later?
Why does one use all the variables in the PDF for randomisation? I mean I know the ‘paramPoint’ parameters will be held constant, but the entire list of variables in the PDF will also include the ‘observable’ itself right? By observable, I mean the one that is ACTUALLY generated later, and which remains the end goal here? Why does one include the observables in the part where the parameters like the NPs are being randomised?

moneta · October 17, 2020, 2:24pm

Hi,

The recommendation is to randomise nuisance parameters when using the HybridCalculator (i.e. Bayesian treatment of nuisance) and in that case a prior pdf is provided. If no prior pdf is provided ToyMCSampler::fPriorNuisance is a null pointer and also fNuisanceParametersSampler is a null pointer and no sampling of nuisance is performed.
When using the Frequentist calculator, the global observables are instead randomised
GloballObservable are randomised using the constraint term, which is extracted directly from the model. There is no need to provide a prior
3 I think one asks for all variables, because one needs to extract also the global observable. In some case those might not be defined in the parameter lists. The set will contain the observable, but nothing is happening to them, they will keep their values until when one calls the generate function (line 596).
So in conclusion, when calling NextPoint with allVarsMinusParamPoint, if this set conta
ins also observables, nothing is going to happen, because the NextPoint function will update only the nuisance, which need to be the variables of the prior pdf provided to the ToyMCSampler.

I hope I have answered your questions, if not please let me know

Best regards

Lorenzo

draggeddown · October 17, 2020, 2:55pm

Hey,
Thanks for your excellent response. I just have a couple of quick follow up questions -

So when the NextPoint() function is called and the allVarsMinusParamPoint is passed to it, it only works on the nuisance parameters which are the variables in the prior PDF? So it doesn’t matter even if the global observables, and the observables are present in allVarsMinusParamPoint?
Even if I do provide a prior PDF, won’t the GenerateGlobalObservables() function still explicitly randomise the global observables anyway? So working in the frequentist scenario is fine, I just have to NOT provide the prior PDF and no randomisation of the nuisance parameters happens. But if I do explicitly provide a prior PDF, how do I make sure my global observables are NOT randomised by the GenerateGlobalObservables() function?
It will be problematic if the prior PDF and the main data generation PDF are the same right? The prior PDF should only have the variables as nuisance parameters?

moneta · October 17, 2020, 3:13pm

yes it does not matter
yes if you provide both a prior pdf and a global observables, then this can be problematic. There is maybe no protection for this. This will make randomise both the global observable and the nuisance parameter.
Yes a prior should be a function of only parameters (i…e. nuisance parameters).
Here an example for a constraint term for the nuisance:

Bayesian case: prior pdf : Gaussian( b | b0, sigma0)
Frequentist case: constraint term: Gaussian( b0 | b, sigma0)

In the first case the prior should not be part of the model, instead in the seoond case the constraint should be part of your model pdf.
Note that in the second case b0 is a global observables and sigma0 just a constant parameter, while in the first case b0 and sigma0 are just some constant parameters.

Lorenzo

draggeddown · October 17, 2020, 5:00pm

Hey,
Thank you so much again for the reply. So to clarify, the only way of correctly using the ToyMCSampler in the Bayesian case, is to provide a prior PDF which is NOT part of the model. And to avoid the randomisation of the global observables by the explicit GenerateGlobalObservables() function, one should simply not use the SetGlobalObservables() function to explicitly provide the ToyMCSampler with the global observables? In my understanding of the code, the global observables are randomised based on the conditional existence of the variable fGlobalObservables, which itself is set explicitly by the user with the help of the SetGlobalObservables() function. So if one doesn’t do this, and provides the proper prior PDF, one would be able to implement the bayesian scenario with the help of ToyMCSampler? Again in my understanding, everything else in the code should function smoothly, since the code calls allVars which will inevitably have the global observables anyway from the workspace (so we would have some set fixed values of the global observables when generating toy data or randomising the nuisance parameters)

moneta · October 20, 2020, 3:12pm

Hi,
I can confirm what you say is correct. If you have further questions, please let me know
Cheers
Lorenzo

draggeddown · October 20, 2020, 3:35pm

Hey,

Thankyou for all your helpful responses, this has been extremely beneficial. I just have a couple of final questions, more or less -

So in the frequentist case, the constraint terms looks like Gauss(b0 | b,sigma) where b0 is the global observable and b,sigma are constant parameters. Extending this, does the Poissonian constraint have to look like Pois( global observable corresponding to poissonian NP | product of constant terms) ?
How are normalisation uncertainties (which do not have any constraint terms) handled by ROOT?

moneta · October 27, 2020, 8:27am

It is correct the constraint term is Gauss( b0 | b, sigma), but b is not a constant parameter. It is the nuisance parameter of your model, if it is constant is like you do not have a nuisance parameter.
The Poisson term will be something like Pois( nb0 | b)
Normalization parameters normally enter in the Poisson extend term, which describe the overall fluctuation of the observed number of events. So it is like having a Poisson constrain term, with
Pois( ntotal_observed | ntotal_expected ) where ntotal_expected will be a function of some normalization parameters (e.g. number of signal events and number of background events)

Lorenzo

draggeddown · October 27, 2020, 8:56am

Hi,

Yes b is indeed a nuisance parameter; I meant that it will be held constant, and it is b0 which will be varied, in the frequentist case (i.e, when the global observables are randomised)
So do you mean to say that an additional Poissonian constraint term of sorts is created interally to handle normalisation uncertainties? Like I said, I talk of the background normalisation uncertainty which has no explicit constraint term in the PDF and in my workspace. Does ROOT create one to handle unconstrained nuisance parameters? I also assume that this will only be relevant for the case where the prior is provided and the nuisance parameters are varied; in the frequentist case, only the global observables are randomised and this Poissonian term need not be created right?
In the frequentist case, it is the global observables which are randomised. Now for toy generation, we need to calculate the expectation value of our main PDF. For this, we use nuisance parameters, not global observables, right? If the nuisance parameters are never randomised, and their values never change, then won’t the toy generation always be the same? How is the randomisation of global observables reflected in the toy generation? As in, is there a certain stage where the randomised global observable values are set equal to the nuisance parameter values?

Thanks in advance!

moneta · October 27, 2020, 10:14am

ok
yes the normalization parameters will be constrained automatically by using an extended pdf in RooFit. You don’t need to add any extra term in this case. But, the pdf needs to be extended.
When doing toy generation you set some value of your nuisance parameters, typically you use their fit values on the observed data, but you could set on the nominal values (i.e. the initial global observable values). The toy generation results (e.g. test statistic values) will be different because the global observable will get a different value for each different toy experiment generated

draggeddown · October 30, 2020, 5:55pm

Thanks for your excellent response. To sum up point 3, what you essentially mean is that ToyMCSampler ultimately uses the same poissonian to generate toy data every time (since the expectation value, a function of the NPs, will always be the same because the NPs always assume their initial nominal values). But the effect of the randomisation of global observables is seen when the test staistic is calculated, in which case, a different likelihood function is minimised each time, owing to the fact that the global observables in the likelihood function will take on new values for each toy. Am I correct in this?

moneta · November 2, 2020, 8:18am

Yes, you are correct on this!
Cheers

Lorenzo

draggeddown · November 6, 2020, 3:14pm

Thanks a lot!

system · November 20, 2020, 3:14pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.