Fit slower on toy data?

nseggert · March 9, 2013, 8:49pm

Hi all,

I’ve run into a strange problem. I recently added a constraint to the PDF generated by HistFactory and for some reason, fitting the PDF is very slow when fitting toy data. However, the observed data is fine. I’ve traced this to the fact that the NLL variable is much slower to evaluate when it’s created with the toy data (300 us vs 60 ms). I’m completely baffled by this behavior. Does anyone have any ideas? Code excerpts below.

Adding the constraints to the pdf:

    ws = R.RooStats.HistFactory.MakeModelAndMeasurementFast(meas)

    top_ratio_val = temp_file.Get("top_ratio")[0]
    ws.factory('expr::top_ratio("n_of_top/n_sf_top", n_of_top, n_sf_top)')
    ws.factory('RooGaussian::top_ratio_constraint(top_ratio, nom_top_ratio[{0}], {1})'.format(top_ratio_val, top_ratio_val*0.1))
    vv_ratio_val = temp_file.Get("vv_ratio")[0]
    ws.factory('expr::vv_ratio("n_of_vv/n_sf_vv", n_of_vv, n_sf_vv)')
    ws.factory('RooGaussian::vv_ratio_constraint(vv_ratio, nom_vv_ratio[{0}], {1})'.format(vv_ratio_val, vv_ratio_val*0.1))
    ws.factory('PROD:constrPdf(simPdf, top_ratio_constraint, vv_ratio_constraint)')

    model = ws.obj("ModelConfig")
    model.SetPdf(ws.obj("constrPdf"))

Generating the toy data:

        sampler = R.RooStats.ToyMCSampler(AD, 1)
        sampler.SetPdf(model.GetPdf())
        sampler.SetObservables(model.GetObservables())
        # sampler.SetGlobalObservables(model.GetGlobalObservables())
        sampler.SetParametersForTestStat(model.GetParametersOfInterest())
        sampler.SetGenerateBinned(False)

        params = model.GetSnapshot() # the snapshot contains the best-fit parameters to data

        toy_data = sampler.GenerateToyData(params)

And creating the NLL variables:

nll_data = model.GetPdf().createNLL(data)
nll_toy = model.GetPdf().createNLL(toy_data)

To check the speed of evaluating the NLL, I wrote a short function that slightly changes one of the model parameters, then runs nll.getVal(). The slight change is needed so that the value needs to be recalculated. Otherwise, it just returns a cached value.

moneta · March 11, 2013, 6:57pm

Hi,

I have seen you generate the toys un-binned. Is your observed data binned or un-binned ? This could explains the difference

Lorenzo