Hey! I’m new here, so pardon me if this is a relatively stupid question.
Basically I have been trying to use ToyMCSampler in conjunction with ProfileLikelihoodTestStat to try and generate sampling distributions for my test statistic (the ultimate aim being to try and compare the results with Wilk’s theorem and the asymptotic approximation).
I was a little unclear about how and where exactly in the ToyMCSampler.cxx code (link here : https://root.cern/doc/master/ToyMCSampler_8cxx_source.html ) the nuisance parameters from my workspace are being randomised (while the toy data is being generated of course).
In line 631 you finally see the pdf.generate() command; does that automatically somehow randomise the nuisance parameters (other than the parameter of interest of course) while generating the data? As far as I know, for each generated dataset, the nominal values (or Global Observables) are set to values gotten from the workspace, while the nuisance parameters are randomised. So where exactly does this happen in the code? More importantly how does it happen? As in, does it take into account the constraint terms for these nuisance parameters that I’ve defined in my workspace?
This could be a stupid question, but I would appreciate any pointers/help.
Thanks in advance!
and welcome to the ROOT forum! I think we need the help of @moneta or @StephanH for this one.
The randomisation happens only if you are using the HybridCalculator, by using the function
For the frequentist case,
FrequentistCalculator there is no randomisation of parameters
(parameters are given in frequentist statistics and if they have to be randomised they would need a prior pdf).
In the frequentist case there is however a profile fit to set the nuisance parameters before sampling.
This happens in
Aah thanks! Also, when using the ToyMCSampler in conjunction with the ProfileLikelihoodTestStat, how do I access the fitted parameter-of-interest values for each evaluation of the test statistic? I know it probably has something to do with the detailed output option in PLTS, but I’m not entirely sure.
Yes you need to enable the detailed output option and you can have a TTree with the fitter parameter values. Otherwise in vetoes option you can get these POI fitted values printed in the screen
Thanks again for your timely reply. Sorry for the follow-up questions, but I’m immensely confused at the moment about the randomisation of parameters in ToyMCSampler.
Here is the link again to the source code : https://root.cern/doc/master/ToyMCSampler_8cxx_source.html
(The function I am talking about starts at line 546)
So in my understanding of the code, if one looks at ToyMCSampler::GenerateToyData(paramPoint), then one sees that
a) A variable ‘allVars’ is created that contains all the variables in the PDF, and this includes the nuisance parameters, the parameters of interest, the global observables AND the observables.
b) The global variables are randomised explicitly by using GenerateGlobalObservables()
c) The NuisanceParameterSampler is created once at the beginning, and at that point the Refresh() function randomises the nuisance parameters once.
d)Then the argument ‘paramPoint’ is basically held constant, while all other parameters are randomised by calling the NextPoint() function.
Now my problem is:
What is the RooStats recommendation about how one should randomise the parameters when the end goal is to generate toy data? As in, what should be randomised and what shouldn’t (I referred to a tutorial : https://root.cern/doc/master/StandardTestStatDistributionDemo_8C.html , but there is no fPrior here, so there is no randomisation)
Why are the global observables explicitly randomised by using the GenerateGlobalObservables() function, when this is already done by using the NextPoint() function a few steps later?
Why does one use all the variables in the PDF for randomisation? I mean I know the ‘paramPoint’ parameters will be held constant, but the entire list of variables in the PDF will also include the ‘observable’ itself right? By observable, I mean the one that is ACTUALLY generated later, and which remains the end goal here? Why does one include the observables in the part where the parameters like the NPs are being randomised?
- The recommendation is to randomise nuisance parameters when using the HybridCalculator (i.e. Bayesian treatment of nuisance) and in that case a prior pdf is provided. If no prior pdf is provided
ToyMCSampler::fPriorNuisance is a null pointer and also
fNuisanceParametersSampler is a null pointer and no sampling of nuisance is performed.
When using the Frequentist calculator, the global observables are instead randomised
- GloballObservable are randomised using the constraint term, which is extracted directly from the model. There is no need to provide a prior
3 I think one asks for all variables, because one needs to extract also the global observable. In some case those might not be defined in the parameter lists. The set will contain the observable, but nothing is happening to them, they will keep their values until when one calls the generate function (line 596).
So in conclusion, when calling
allVarsMinusParamPoint, if this set conta
ins also observables, nothing is going to happen, because the NextPoint function will update only the nuisance, which need to be the variables of the prior pdf provided to the ToyMCSampler.
I hope I have answered your questions, if not please let me know
Thanks for your excellent response. I just have a couple of quick follow up questions -
So when the NextPoint() function is called and the allVarsMinusParamPoint is passed to it, it only works on the nuisance parameters which are the variables in the prior PDF? So it doesn’t matter even if the global observables, and the observables are present in
Even if I do provide a prior PDF, won’t the GenerateGlobalObservables() function still explicitly randomise the global observables anyway? So working in the frequentist scenario is fine, I just have to NOT provide the prior PDF and no randomisation of the nuisance parameters happens. But if I do explicitly provide a prior PDF, how do I make sure my global observables are NOT randomised by the GenerateGlobalObservables() function?
It will be problematic if the prior PDF and the main data generation PDF are the same right? The prior PDF should only have the variables as nuisance parameters?
- yes it does not matter
- yes if you provide both a prior pdf and a global observables, then this can be problematic. There is maybe no protection for this. This will make randomise both the global observable and the nuisance parameter.
- Yes a prior should be a function of only parameters (i…e. nuisance parameters).
Here an example for a constraint term for the nuisance:
- Bayesian case: prior pdf :
Gaussian( b | b0, sigma0)
- Frequentist case: constraint term:
Gaussian( b0 | b, sigma0)
In the first case the prior should not be part of the model, instead in the seoond case the constraint should be part of your model pdf.
Note that in the second case b0 is a global observables and sigma0 just a constant parameter, while in the first case b0 and sigma0 are just some constant parameters.
Thank you so much again for the reply. So to clarify, the only way of correctly using the ToyMCSampler in the Bayesian case, is to provide a prior PDF which is NOT part of the model. And to avoid the randomisation of the global observables by the explicit GenerateGlobalObservables() function, one should simply not use the SetGlobalObservables() function to explicitly provide the ToyMCSampler with the global observables? In my understanding of the code, the global observables are randomised based on the conditional existence of the variable
fGlobalObservables, which itself is set explicitly by the user with the help of the SetGlobalObservables() function. So if one doesn’t do this, and provides the proper prior PDF, one would be able to implement the bayesian scenario with the help of ToyMCSampler? Again in my understanding, everything else in the code should function smoothly, since the code calls
allVars which will inevitably have the global observables anyway from the workspace (so we would have some set fixed values of the global observables when generating toy data or randomising the nuisance parameters)
I can confirm what you say is correct. If you have further questions, please let me know
Thankyou for all your helpful responses, this has been extremely beneficial. I just have a couple of final questions, more or less -
So in the frequentist case, the constraint terms looks like
Gauss(b0 | b,sigma) where b0 is the global observable and b,sigma are constant parameters. Extending this, does the Poissonian constraint have to look like
Pois( global observable corresponding to poissonian NP | product of constant terms) ?
How are normalisation uncertainties (which do not have any constraint terms) handled by ROOT?
Thanks for your excellent response. To sum up point 3, what you essentially mean is that ToyMCSampler ultimately uses the same poissonian to generate toy data every time (since the expectation value, a function of the NPs, will always be the same because the NPs always assume their initial nominal values). But the effect of the randomisation of global observables is seen when the test staistic is calculated, in which case, a different likelihood function is minimised each time, owing to the fact that the global observables in the likelihood function will take on new values for each toy. Am I correct in this?
Yes, you are correct on this!
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.