Yield error in extended fit

fbetti · June 10, 2015, 3:35pm

Hello everyone,

I have a little problem.
I am doing an extended likelihood fits on an unbinned sets of weighted events. I am using a RooAddPdf in order to sum four components (one for signal and the other for various kinds of background). During the fit, the parameters of the background PDFs are fixed, while the parameters of the signal are floating. Moreover I let floating the signal yields, while I keep constant the ratio between each background yield and the signal yield (basically, for each background sample I fix the value of a RooRealVar “Ratio” and I use a RooFormulaVar = SignalYield*Ratio as background yield).
The point is that at the end of the fit I get a signal yield equal to 13451 +/- 48, which has a very small error. But I expected that the error of a yield would be at least its square root, about 116 in this case.
The problem may be not related to the weights, since I am using SumW2Error(kTRUE).
Is it my expectation wrong or am I doing something wrong in my fit? For what I remember, I have always seen yields with an error greater than the square root.

Thanks a lot,

Federico

moneta · June 10, 2015, 4:30pm

Hi,

Yes the error seems to be too small. IN general is larger than the square(root) depending on the signal/background ratio.
The problem might be caused by the weights. Could you post a running example showing the problem, or at least the RooFIt workspace with the model and the dataset so I can investigate this problem.

Best Regards

Lorenzo

fbetti · June 11, 2015, 9:15am

Hi Lorenzo,

thank you for the quick reply!
I attach here the file .root containg the workspace. The events in the dataset have three different weights (because I used three different MonteCarlo to build this “pseudo-data” sample), always less than 1.
fbetti_workspace.root (957 KB)In the workspace I saved the 2D PDF and the dataset.

I attach also the code necessary to fit the PDF on the dataset.
myfit.C (2.67 KB)

using namespace RooFit ;

void myfit()
{
  TFile *f = new TFile("fbetti_workspace.root") ;

  RooWorkspace* w = (RooWorkspace*) f->Get("w") ;

  RooRealVar* B0mES = w->var("B0mES");
  RooRealVar* B0DeltaE = w->var("B0DeltaE");
  RooRealVar* NSIG = w->var("NSIG");
  RooRealVar* mSIG_mES = w->var("m_{SIG}(m_{ES})");
  RooRealVar* sigmaSIG_mES = w->var("#sigma_{SIG}(m_{ES})");
  RooRealVar* aSIG_mES = w->var("a_{SIG}(m_{ES})");
  RooRealVar* nSIG_mES = w->var("n_{SIG}(m_{ES})");
  RooRealVar* m1SIG_DeltaE = w->var("m1_{SIG}(#DeltaE)");
  RooRealVar* m2SIG_DeltaE = w->var("m2_{SIG}(#DeltaE)");
  RooRealVar* sigma1SIG_DeltaE = w->var("#sigma1_{SIG}(#DeltaE)");
  RooRealVar* sigma2SIG_DeltaE = w->var("#sigma2_{SIG}(#DeltaE)");
  RooRealVar* fGau1SIG_DeltaE = w->var("f(Gau1)_{SIG}(#DeltaE)");
  RooAbsPdf* PsDataPDF = w->pdf("PsDataPDF") ;
  RooAbsData* dsPsData = w->data("dsPsData") ;

  // Now I fix all the parameters
  RooArgSet* param = PsDataPDF->getParameters(RooArgSet(*B0mES,*B0DeltaE));
  //param->Print();
  TIterator * iter = param->createIterator();
  RooRealVar * var;
  while(0 != (var= (RooRealVar*)iter->Next())) {
    // Fix this parameter:
    var->setConstant(kTRUE);
  }

  // I want the parameters of the signal to be floating in the ranges obtained by a previous fit on MonteCarlo
  NSIG->setConstant(kFALSE);
  mSIG_mES->setConstant(kFALSE);
  sigmaSIG_mES->setConstant(kFALSE);
  aSIG_mES->setRange(aSIG_mES->getVal()-aSIG_mES->getError(),aSIG_mES->getVal()+aSIG_mES->getError());
  aSIG_mES->setConstant(kFALSE);
  nSIG_mES->setRange(nSIG_mES->getVal()-nSIG_mES->getError(),nSIG_mES->getVal()+nSIG_mES->getError());
  nSIG_mES->setConstant(kFALSE);
  m1SIG_DeltaE->setRange(m1SIG_DeltaE->getVal()-m1SIG_DeltaE->getError(),m1SIG_DeltaE->getVal()+m1SIG_DeltaE->getError());
  m1SIG_DeltaE->setConstant(kFALSE);
  m2SIG_DeltaE->setRange(m2SIG_DeltaE->getVal()-m2SIG_DeltaE->getError(),m2SIG_DeltaE->getVal()+m2SIG_DeltaE->getError());
  m2SIG_DeltaE->setConstant(kFALSE);
  sigma1SIG_DeltaE->setRange(sigma1SIG_DeltaE->getVal()-sigma1SIG_DeltaE->getError(),sigma1SIG_DeltaE->getVal()+sigma1SIG_DeltaE->getError());
  sigma1SIG_DeltaE->setConstant(kFALSE);
  sigma2SIG_DeltaE->setRange(sigma2SIG_DeltaE->getVal()-sigma2SIG_DeltaE->getError(),sigma2SIG_DeltaE->getVal()+sigma2SIG_DeltaE->getError());
  sigma2SIG_DeltaE->setConstant(kFALSE);
  fGau1SIG_DeltaE->setRange(fGau1SIG_DeltaE->getVal()-fGau1SIG_DeltaE->getError(),fGau1SIG_DeltaE->getVal()+fGau1SIG_DeltaE->getError());
  fGau1SIG_DeltaE->setConstant(kFALSE);
  
  // ---------------
  // Fit pseudodata
  // ---------------
  RooFitResult* fitPsData = PsDataPDF->fitTo(*dsPsData,Save(),Extended(1),SumW2Error(kTRUE));
  fitPsData->Print();
  
}

As you see, I constraint all the parameters except the ones concerning the signal.
I have to do a correction to what I said before: if I use SumW2Error(kTRUE) I get 41 as error on NSIG, while I get 48 if I use SumW2Error(kFALSE). This is ok, because I have the weights always less than 1.

Please let me know if you need more details.
I hope to have attached the things in the correct way, it is the first time I post on this forum.

Thank you,
Federico

moneta · June 22, 2015, 10:44am

Hi Federico

I have tried your fit and seems to be fine. The small error is probably caused by the weights. It would be interesting to know if by scaling for example the weights by the same factor if the error stays the same.
Also what it actually counts is the number of effective entries.
Do you know approximately how many effective entries do you have in the signal region ?

Cheers

Lorenzo

fbetti · June 22, 2015, 1:40pm

Hi,

Good idea, I could try to scale the weight. Do you know whether there is a way to create a new dataset from the one I have now, scaling the weight variable? Or should I redo my ntuples?
I don’t understand why the number of entries counts. Should the option SumW2Error(kTRUE) adjust the error according to the weights?
Couldn’t the “small error” problem due to the fact that I constraint 3 yields on the total of 4? I mean, with this constraint (and the fact that the sum of yields is equal to the total number of events) the signal yields should have no freedom… But it is quite probable that I am doing a little confusion.

Do you mean the number of entries in the signal region in pseudodata sample, or the expected number of signal entries?

Thanks a lot

cheers,
Federico

moneta · June 22, 2015, 3:44pm

Hi,

I think for scaling the weight is easier to re-make a new data sets and import all values but with a scaled weight.

What I meant was the number of effective entries (sum of weight)^2 / ( sum of square of the weight).
This is what drives the statistical error. It could be nice to see what is this value in the signal region (S + B) and compared what the fit gives you.

But I see now in your code that the PDF is made of several components and you are setting some of the fraction of these components to constant. This clearly puts additional information in the fit and can reduce significantly the statistical error of the fit, because effectively you use now all events (including the background ones).

Lorenzo