Asymptotic Calculator on simultaneous Pdf

I’m new to RooFit/RooStats, and would appreciate any help on this! I’m using the AsimovCalculator tool for likelihood testing of a SimultaneousPdf signal+background model, and am getting problems when generating the Asimov data in a particular context.

I have some nuisance parameters which control the signal shape over the various categories, for which I have a product of Gaussians. So the last stages of the code constructing my PDF looks something like this:

RooSimultaneous sim_sb_model("sim_sb_model","sim_sb_model",category_pdf_map,categories);
RooGaussian gauss("gauss","gauss", alpha, mean, sd);
RooProdPdf sb_model("sb_model","sb_model", gauss,Conditional(sim_sb_model,obs))

sb_model is then used for my hypothesis testing. But as soon as I have more than one category, the asimov data generation goes crazy and gives me 0 events. I believe I’ve tracked it down to this line in AsymptoticCalculator.cxx:


 1092    const RooSimultaneous* simPdf = dynamic_cast<const RooSimultaneous*>(&pdf);
 1093    if (!simPdf) { 
 1094       // generate data for non sim pdf
 1095       return GenerateAsimovDataSinglePdf( pdf, observables, *weightVar, 0);
 1096    }

I have checked that the dynamic_cast of my pdf returns a null pointer, I suppose because it is a product pdf and not a simultaneous pdf, and therefore the code attempts to generate asimov data using the single pdf code. But if it tried to unwrap the product it would see there is a RooSimultaneous in there!

Is there a different way that I’m supposed to construct a pdf of this sort?

After a little more googling it seems similar problems were reported here without resolution:

I found my own workaround to the problem, though it is not particularly beautiful.

My initial attempt was to edit the following lines explicitly to allow for the case where a RooSimultaneous is embedded in a product with constraints.

 1092    const RooSimultaneous* simPdf = dynamic_cast<const RooSimultaneous*>(&pdf);
 1093    if (!simPdf) { 
 1094       // generate data for non sim pdf
 1095       return GenerateAsimovDataSinglePdf( pdf, observables, *weightVar, 0);
 1096    }

While my fix allowed for the generation of Asimov data, for some reason after the generation my pdf would be ‘broken’ and unable to fit the data in a sensible way, similar to the description in the posts I linked in my previous post. I have no idea why it was happening, but it was during the call to AsymptoticCalculator::FillBins that the pdf ‘broke’.

So instead I implemented the following ugly workaround. I added functionality in AsymptoticCalculator to allow to import my own pre-generated Asimov dataset. In this case, a new if statement in the Initialization method prevents the generation of a new Asimov data set, and it just uses the imported one.

The rest of the code works just fine and I am now able to perform all of the usual hypothesis testing routines and things are behaving sensibly. Still, I feel that there should have been a better way to do this.

Hi,

I will look into this. As your investigations suggests seems to be a limitation for this type of models.
A possible solution is to move the constraints in the model before to each single pdf component.
Can you please send me your workspace so I can investigate this in further details and fix eventually the bug ?

Thank you

Lorenzo

Hi Lorenzo!

This is a partial recast of the search in http://cms-results.web.cern.ch/cms-results/public-results/preliminary-results/B2G-16-021/index.html with only two of the signal categories, WZ HP (high purity) and WZ LP (low purity) (the data, background fit and signal shapes can be seen in fig 1). The following root file contains a workspace with a reduced version of my pdf and data:
debug_workspace.root (29.0 KB)

The workspace is called “debug_WS”. Inside the workspace is the full model pdf, “sb_model”. This is made of a product of a gaussian constraint term and a RooSimultaneous “sim_sb_model”. There are two categories, and there is a nuisance parameter “alpha_tau21” which mostly controls how much of the signal appears in category 1 and how much in category 2, but it also gives a small amount of signal shape morphing within each category (this is constrained by the gaussian constraint term). There are also five other unconstrained nuisance parameters which control the shape and normalization of the background within each category.

The data is called “CMS_data_VV”.

Here is a short analysis script which illustrates the problems I run in to:
forLorenzo_analysiscode.C (4.8 KB)

If I run the code with:

run_diagnosis_fullmodel(TString("./debug_workspace.root"), TString("sb_model"))

then asimov data is generated for only one category. This is for the reason identified in my first post, the code does not recognize this pdf as a RooSimultaneous because it is a RooProdPdf.

My first attempt at a solution was to replace in AsymptoticCalculator.cxx:

 1274    RooAbsData * asimov = GenerateAsimovData(*model.GetPdf() , *model.GetObservables() );

with:

   RooAbsPdf *modelpdf = (RooAbsPdf*) model.GetPdf()->Clone("tempmodel");
   RooAbsPdf *jacksPdf = MakeUnconstrainedPdf(*modelpdf,*model.GetObservables(),"unconstrained");
   RooAbsData * asimov = GenerateAsimovData(*jacksPdf , *model.GetObservables() );

however this didn’t work (both without and with the cloning step). I don’t remember exactly the problem, it was either seg faults or nonsensical Asimov data. In order to try and understand what is the problem, I perform the following steps.

If I return to the original version of AsymptoticCalculator.cxx and run my macro with

run_diagnosis_fullmodel(TString("./debug_workspace.root"), TString("sim_sb_model"))

i.e. without constraint for the nuisance parameter, then the fit works fine and the asimov data is generated.

however if I now run with

run_diagnosis_fullmodel(TString("./debug_workspace.root"), TString("sim_sb_model"),true)

, this initializes the asymptotic calculator with the unconstrained model, and then afterwards attempts to fit the constrained model to the data on the last line of the macro. This last step causes a seg fault. Therefore, initializing the asymptotic calculator with the unconstrained model somehow breaks the constrained model.

I tried to find exactly the moment in AsymptoticCalculator.cxx where the initialization with the unconstrained pdf breaks the constrained pdf. It is in the call to FillBins on line 1046 (i.e. if I run the initialization code line by line through the root c++ interpreter, I find that the unconstrained pdf is not broken before running 1046, but is broken after). I then tracked it down to the first call of line 847 or 848 (i.e. when i = 0 in the for loop, before calling line 847 or 848, I dont remember exactly which, the pdf is not broken, afterwards it is broken i.e. an attempt at fitTo will cause a seg fault). This behaviour is completely mysterious to me and so I gave up here and just implemented by workaround instead.

In case it is of interest, the workspace was created with the following files.
forLorenzo_createWorkspace.C (12.1 KB)
CMShadronic.root (40.7 KB)

As a footnote, I found a few bugs in the AsymptoticCalculator code:

  1. In AsymptoticCalculator.h, replace:
  122       const RooArgSet & GetBestFitParams() const { return fBestFitPoi; }

with

  122       const RooArgSet & GetBestFitParams() const { return fBestFitParams; }

(the method for returning fBestFitPoi is already implemented on line 118, presumably it was copy-pasted without fully being corrected).

  1. In AsymptoticCalculator.cxx replace:
  624    if (TMath::IsNaN(qmu) ) { 

with

  624    if (TMath::IsNaN(qmu_A) ) { 

(this was presumably copy-pasted from line 553 without being properly corrected).

  1. In AsymptoticCalculator.cxx, lines 185-202, and line 254, is incorrect for simultaneous pdfs. If there are n bins per category, and m categories, then this code will lead to the creation of Asimov data with n*m bins per category, rather than n bins per category. Also, it will destroy any variable binning that has been previously implemented, as is the case in my workspace. As a workaround I just commented out lines 199 and 254, and made sure my observable was implemented with the same binning as the data.

Finally, for my attempts at diagnosis, I found it useful to add the additional method in AsymptoticCalculator.h:

      RooAbsData & GetAsimovData() const { return *fAsimovData; }

there is a commented out block in forLorenzo_analysiscode.C, which just produces some handy plots of fitted pdfs and real and asimov data. It is commented because it will not work without this new method and bugfix (1) and maybe also (3).