RooStats hypothesis testing into

GiacomoBoldrini · June 25, 2019, 11:06am

Dear experts.
I was asked to do a test hypothesis but I don’t really know what I’m doing, I’ll explain myself.
I have a distribution “genpdf” from which I sample a dataset. I’m then asked to do a statistical test on that data with respect to the genpdf itself and to a uniform distribution. Here is the code I use for creating the uniform pdf and the genpdf from which I sample data:

RooRealVar  t("t","t",0,2*M_PI) ;
    RooRealVar  p("p","p", 0, M_PI) ;
    
    //Defining Parameters according to RooFit.
    //Initial value is the 3rd argument, 4th and 5th arg
    //are the range of existance of the parameter.
    RooRealVar alpha("alpha", "alpha", 0.65, 0.6, 0.7);

    RooRealVar beta("beta", "beta", 0.06, 0, 0.1);
    RooRealVar gamma("gamma", "gamma", -0.18, -0.2, -0.16);
    RooRealVar scale("scale", "scale", 1., 0, 10);
    RooRealVar scale_uni("scale_uni", "scale_uni", 1.);
    RooRealVar for_comp("for_comp", "for_comp", 0.1, 0, 1);

    RooUniform uni_t("uniformt", "uniformt", t);
    RooUniform uni_p("uniformp", "uniformp", p);
    
    //UNIFORM PDF
    RooAddPdf unif("unif", "uni_t+uni_p", RooArgList(uni_t, uni_p), RooArgList(scale_uni));
    
    //ClassFactory generates a pdf instance and normalize it.
    RooAbsPdf* genpdf = RooClassFactory::makePdfInstance("GenPdf","scale*(3./(4.*M_PI))*(0.5*(1.-alpha) + (0.5)*(3.*alpha-1)*cos(t)*cos(t) - beta*sin(t)*sin(t)*cos(2.*p)- sqrt(2.)*gamma*sin(2.*t)*cos(p))",RooArgSet(t,p,alpha, beta, gamma, scale)) ;

    RooDataSet* data = genpdf->generate(RooArgSet(t,p), 50000);

The genpdf reduces to a uniform distribution when alpha=1/3, beta=0, gamma=0.
For the statistical test I built the following pdf as summing the genpdf and the uniform one with a proportionality parameter “for_comp”:

RooAddPdf zero_meno("zero_meno", "gen_pdf+uniform", RooArgList(*genpdf,unif), RooArgList(for_comp) );

After checking that everything works ok (I fitted zero_mean to the data using unbind ML fit, as expected to returned only the gen_pdf setting the uniform at 0) I create the workspace needed for the test:

RooWorkspace*w = new RooWorkspace("w");
  w->import(t);
  w->import(p);
  w->import(zero_meno);
  w->import(*data);

  w->defineSet("obs",RooArgSet(t,p));
  w->defineSet("poi",RooArgSet(for_comp));
  w->defineSet("nuisParams","alpha, beta, gamma, scale");

  ModelConfig zero_model("zero_model", w);
  zero_model.SetPdf(*w->pdf("zero_meno"));
  zero_model.SetObservables(*w->set("obs"));
  zero_model.SetParametersOfInterest(*w->set("poi"));
  w->var("for_comp")->setVal(0.0);  // important only uniform
  zero_model.SetSnapshot(*w->set("poi"));

  // create the alternate (signal+background) ModelConfig with s=50
  ModelConfig uno_model("uno_model", w);
  uno_model.SetPdf(*w->pdf("zero_meno"));
  uno_model.SetObservables(*w->set("obs"));
  uno_model.SetParametersOfInterest(*w->set("poi"));
  w->var("for_comp")->setVal(0.5); // Setting non zero value to genpdf
  uno_model.SetSnapshot(*w->set("poi"));

Then I create a frequentist calculator and define the ProfileLikelihood as test statistics and run the HypoTest:

FrequentistCalculator *  fc  = new FrequentistCalculator(*data, zero_model, uno_model);
  fc->SetToys(1000,1000);    // 1000 for null (B) and 1000 for alt (S+B) 

  // create the test statistics
  ProfileLikelihoodTestStat profll(*uno_model.GetPdf());
  // use one-sided profile likelihood
  profll.SetOneSidedDiscovery(true);

  // configure  ToyMCSampler and set the test statistics
  ToyMCSampler *toymcs = (ToyMCSampler*)fc->GetTestStatSampler();
  toymcs->SetTestStatistic(&profll);

  //Not sure about this
  if (!uno_model.GetPdf()->canBeExtended())
    toymcs->SetNEventsPerToy(1);

  // run the test
  HypoTestResult * res = fc->GetHypoTest();
  res->Print();

This action has no end and it keeps iterating (waited 20m and no results). Furthermore this message keeps appearing:

[#0] ERROR:Eval -- RooAbsReal::logEvalError(GenPdf) evaluation error, 
 origin       : RooGenPdfPdf::GenPdf[ t=t p=p alpha=alpha beta=beta gamma=gamma scale=scale ]
 message      : p.d.f value is less than zero (-0.000048), forcing value to zero

The fact is, most of this code is taken from various examples of S+B over B models. I don’t know if this class also applies to my problem, if so am I doing it right? Is this the right way to use RooStats for the problem?
Thank you, have a nice day.

oshadura · June 25, 2019, 1:37pm

@StephanH can you please help here?

Thank you in advance,
Oksana.

StephanH · June 26, 2019, 5:19pm

Hi Giacomo,

I didn’t have time to fully go through all code, but there’s a very simple problem that I could spot immediately:
A PDF cannot be negative.

It seems that the allowed parameter ranges for the GenPdf allow it to go negative, and in that moment the fitter cannot recover. No matter what it does, all answers it gets will be zero. I see the following options:

Restrict the ranges of the parameters of GenPdf such that it stays positive.
Constrain the parameters to a reasonable region without directly restricting the ranges.
Re-parametrise the function in GenPdf.
Use a RooRealSumPdf instead of a RooAddPdf. The difference between the two is that for the former, single terms in the sum are allowed to be negative, but the sum of all is not, whereas for the latter, none of the terms can be negative.

This needs to be solved first. After that, we can check if there are other problems.