Hi Lorenzo!
This is a partial recast of the search in http://cms-results.web.cern.ch/cms-results/public-results/preliminary-results/B2G-16-021/index.html with only two of the signal categories, WZ HP (high purity) and WZ LP (low purity) (the data, background fit and signal shapes can be seen in fig 1). The following root file contains a workspace with a reduced version of my pdf and data:
debug_workspace.root (29.0 KB)
The workspace is called “debug_WS”. Inside the workspace is the full model pdf, “sb_model”. This is made of a product of a gaussian constraint term and a RooSimultaneous “sim_sb_model”. There are two categories, and there is a nuisance parameter “alpha_tau21” which mostly controls how much of the signal appears in category 1 and how much in category 2, but it also gives a small amount of signal shape morphing within each category (this is constrained by the gaussian constraint term). There are also five other unconstrained nuisance parameters which control the shape and normalization of the background within each category.
The data is called “CMS_data_VV”.
Here is a short analysis script which illustrates the problems I run in to:
forLorenzo_analysiscode.C (4.8 KB)
If I run the code with:
run_diagnosis_fullmodel(TString("./debug_workspace.root"), TString("sb_model"))
then asimov data is generated for only one category. This is for the reason identified in my first post, the code does not recognize this pdf as a RooSimultaneous because it is a RooProdPdf.
My first attempt at a solution was to replace in AsymptoticCalculator.cxx:
1274 RooAbsData * asimov = GenerateAsimovData(*model.GetPdf() , *model.GetObservables() );
with:
RooAbsPdf *modelpdf = (RooAbsPdf*) model.GetPdf()->Clone("tempmodel");
RooAbsPdf *jacksPdf = MakeUnconstrainedPdf(*modelpdf,*model.GetObservables(),"unconstrained");
RooAbsData * asimov = GenerateAsimovData(*jacksPdf , *model.GetObservables() );
however this didn’t work (both without and with the cloning step). I don’t remember exactly the problem, it was either seg faults or nonsensical Asimov data. In order to try and understand what is the problem, I perform the following steps.
If I return to the original version of AsymptoticCalculator.cxx and run my macro with
run_diagnosis_fullmodel(TString("./debug_workspace.root"), TString("sim_sb_model"))
i.e. without constraint for the nuisance parameter, then the fit works fine and the asimov data is generated.
however if I now run with
run_diagnosis_fullmodel(TString("./debug_workspace.root"), TString("sim_sb_model"),true)
, this initializes the asymptotic calculator with the unconstrained model, and then afterwards attempts to fit the constrained model to the data on the last line of the macro. This last step causes a seg fault. Therefore, initializing the asymptotic calculator with the unconstrained model somehow breaks the constrained model.
I tried to find exactly the moment in AsymptoticCalculator.cxx where the initialization with the unconstrained pdf breaks the constrained pdf. It is in the call to FillBins on line 1046 (i.e. if I run the initialization code line by line through the root c++ interpreter, I find that the unconstrained pdf is not broken before running 1046, but is broken after). I then tracked it down to the first call of line 847 or 848 (i.e. when i = 0 in the for loop, before calling line 847 or 848, I dont remember exactly which, the pdf is not broken, afterwards it is broken i.e. an attempt at fitTo will cause a seg fault). This behaviour is completely mysterious to me and so I gave up here and just implemented by workaround instead.
In case it is of interest, the workspace was created with the following files.
forLorenzo_createWorkspace.C (12.1 KB)
CMShadronic.root (40.7 KB)
As a footnote, I found a few bugs in the AsymptoticCalculator code:
- In AsymptoticCalculator.h, replace:
122 const RooArgSet & GetBestFitParams() const { return fBestFitPoi; }
with
122 const RooArgSet & GetBestFitParams() const { return fBestFitParams; }
(the method for returning fBestFitPoi is already implemented on line 118, presumably it was copy-pasted without fully being corrected).
- In AsymptoticCalculator.cxx replace:
624 if (TMath::IsNaN(qmu) ) {
with
624 if (TMath::IsNaN(qmu_A) ) {
(this was presumably copy-pasted from line 553 without being properly corrected).
- In AsymptoticCalculator.cxx, lines 185-202, and line 254, is incorrect for simultaneous pdfs. If there are n bins per category, and m categories, then this code will lead to the creation of Asimov data with n*m bins per category, rather than n bins per category. Also, it will destroy any variable binning that has been previously implemented, as is the case in my workspace. As a workaround I just commented out lines 199 and 254, and made sure my observable was implemented with the same binning as the data.
Finally, for my attempts at diagnosis, I found it useful to add the additional method in AsymptoticCalculator.h:
RooAbsData & GetAsimovData() const { return *fAsimovData; }
there is a commented out block in forLorenzo_analysiscode.C, which just produces some handy plots of fitted pdfs and real and asimov data. It is commented because it will not work without this new method and bugfix (1) and maybe also (3).