Including Beeston-Barlow and shape uncertainties in toy data generation with HistFactory

Dear experts,

I am performing toy fit studies in order to validate my fit. For context, I am using HistFactory to perform a two-dimensional template fit, where the uncertainties associated with the limited statistics of the templates used to model signal and backgrounds are taken into account with the Beeston–Barlow (light) method, and the shape uncertainties are taken into account by interpolating between the provided variation histograms corresponding to ±1 sigma. Additionally, some of the templates are constrained to external measurements, where the associated uncertainties are also taken into account.

To perform the toy studies correctly, I need to include the Beeston–Barlow uncertainties, the shape uncertainties, and the uncertainties of the external constraints in the generation of the toy data. However, I have not been able to find a good way to do this. I know that pyhf provides a function that allows one to sample toy data while taking all uncertainties into account. My question is therefore whether something similar exists for HistFactory in RooFit or RooStats.

Below I provide some details on how I am currently generating the toy data, which does not seem to include the Beeston–Barlow and shape uncertainties. The reason I believe these uncertainties are not correctly accounted for in the toy data generation is that, when fitting the toy data, the widths of the pull distributions are below one (approximately 0.85). The pulls are defined as:

pulls = (fitted toy yield - nominal yield)/(uncertainty of fitted toy yield)

The nominal yield is based on the fit performed to the real data (and is therefore fixed), while the fitted toy yield and its uncertainty are determined on a per-toy basis.

Firstly, the Beeston–Barlow method is activated for all templates (that is, for all samples) using:

sample.ActivateStatError();

Some templates are constrained based on external measurements. For example, in this case mode1 is constrained to modeRef. Here, frac_mode1_vs_modeRef is the external constraint, and lim_mode1_vs_modeRef_min / lim_mode1_vs_modeRef_max represent the ±1 sigma relative uncertainty of the constraint:

sample.AddNormFactor("yield_modeRef”, N_modeRef, 0.0, nData);

sample.AddNormFactor(“f_mode1_vs_modeRef", frac_mode1_vs_modeRef, frac_mode1_vs_modeRef, frac_mode1_vs_modeRef);

sample.AddOverallSys(“sigma_mode1_vs_modeRef”, lim_mode1_vs_modeRef_min, lim_mode1_vs_modeRef_max);

For some samples, the shape uncertainty is included. For example, for the signal, where h1_max_signal and h1_min_signal correspond to the +1 sigma and –1 sigma shape variations:

sample.AddHistoSys(“shape_signal” , “h1_min_signal”, histoFileName, “”, “h1_max_signal”, histoFileName, “”);

To produce the toy data sample, I use the total fit model pdf, defined as follows:

RooStats::ModelConfig* modelConf = (RooStats::ModelConfig*)workSpace->obj(“ModelConfig”);

RooSimultaneous* model = (RooSimultaneous*)modelConf->GetPdf();

RooArgSet* obs = (RooArgSet*)modelConf->GetObservables();

RooCategory *idx = (RooCategory*)obs->find(“channelCat”);

RooAbsData* data = (RooAbsData*)workSpace->data(“obsData”);

RooRealVar* BMCorrSig = (RooRealVar*)obs->find(“obs_x_BMCorr”);

RooRealVar* PIPIMSig = (RooRealVar*)obs->find(“obs_y_BMCorr”);

RooSimultaneous* totalPdf = new RooSimultaneous( *model );

RooAbsReal* nll =  totalPdf->createNLL( *data , RooFit::Offset(kTRUE) );

RooStats::HistFactory::RooBarlowBeestonLL* bbnll = new RooStats::HistFactory::RooBarlowBeestonLL(“bbnll”, “bbnll”, *nll);

bbnll->setPdf( totalPdf );

bbnll->setDataset( data );

bbnll->initializeBarlowCache();

After performing the fit, I save the model in a work space like this:

RooWorkspace ws(“ws”);

ws.import(*totalPdf);

ws.import(*BMCorrSig);

ws.import(*PIPIMSig);

ws.import(*idx);

ws.import(*data);

ws.writeToFile(ws_file);

Then to generate the toy data, I import the model from the workspace:

RooSimultaneous* totalPdf = (RooSimultaneous*)ws->pdf(fit_model_name);

RooAbsData* data = (RooAbsData*)ws->data(data_name);

RooRealVar* BMCorrSig = (RooRealVar*)ws->var(obs_x_name);

RooRealVar* PIPIMSig = (RooRealVar*)ws->var(obs_y_name);

RooCategory* idx = (RooCategory*)ws->cat(category_name);

And finally, I generate the toy data like this:

RooDataSet* data_new = totalPdf->generate(RooArgSet(*BMCorrSig, *PIPIMSig, *idx), RooFit::Extended());

However, this procedure does not seem to include the Beeston–Barlow and shape uncertainties.

Any guidance or examples on how to include these effects in the toy data generation would be greatly appreciated. Please let me know if more information about my setup or code would be helpful.

Best regards,

Veronica

Maybe @jonas knows

Maybe relevant: Roofit uses Conway’s approximation, see [RF] Missing and misleading documentation in RooBarlowBeestonLL.cxx · Issue #15252 · root-project/root · GitHub