Histogram pdf and unbinned fits

mgv4ce · January 26, 2017, 5:20pm

Hi experts,
While calling Minuit, I would like to parametrize my data by an array of histograms stored in a file. This means the TF1 is not needed, and I want to make sure that it is not necessary for the minuit calls under the hood. The examples suggest that minuit does not need the TF1 to work, but I see no examples without a TF1…

I also want to try an unbinned parameter extraction. Can minuit handle this? I would simply define a global data TTree instead of TH2, and pass it to the fcn.

Specifically I have three questions:

Should the code below work, without defining a TF1 and telling minuit about it?
Should the code below work, defining a TTree as the global data set, and passing it around?
What are the numbers I should multiply by to get the errors right? In the code above I multiply nll by -2. But I see some conflicting accounts, especially for the unbinned case.

Many thanks for your help!

TFile *expFile;
TFile *simFile;
TH2D *exp_hist;
TTree *exp_tree;

double binnedNLL(TH2F* h_meas, TH2F *h_sim){
  double nll=0;
  TAxis *xaxis = h_meas->GetXaxis();
  TAxis *yaxis = h_meas->GetYaxis();
  Double_t X,Y;
  for (int i=1; i<=h_meas->GetNbinsX(); i++)for (int j=1; j<=h_meas->GetNbinsY(); j++){
      X=xaxis->GetBinCenter(i);
      Y=yaxis->GetBinCenter(j);
      int n=(int)(h_meas->GetBinContent(i,j));
      double mu;
      mu=(double)h_sim->Interpolate(X,Y);
      if (mu<1e-10) mu=1e-10;    // avoid log(0) problems!
      nll += n*TMath::Log(mu) + mu + TMath::LnGamma(n+1);
    }
  return -2*nll;   // factor of -2 so minuit gets the errors right
}

double unbinnedNLL(TTree *t_meas, TH2D *h_sim){
  double nll=0;
  Double_t X,Y;
  t_meas->SetBranchAddress("X",&X);
  t_meas->SetBranchAddress("Y",&Y);

  Long64_t nentries = t_meas->GetEntries();
  for (Long64_t jentry=0; jentry<nentries; jentry++){
      double mu;
      mu=(double)h_sim->Interpolate(X,Y);
      if (mu<1e-10) mu=1e-10;    // avoid log(0) problems!
      nll += TMath::Log(mu);
    }
  //nll+=h_sim->Integral(); //the extended likelihood needs the yield
  return -2*nll;   // factor of -2 so minuit gets the errors right
}

void fcn(int& npar, double* deriv, double& f, double par[], int flag){
  TString hsim_name=Form(h2_%d_%d,par[0],par[1]);
  TH2D *hsim=(TH2F*)simFile->Get(hsim_name.Data());
  f = binnedNLL(exp_hist,hsim);
  //f = unbinnedNLL(exp_tree,hsim);
} // end of fcn

int main(int argc, char **argv) {
  TApplication theApp("App", &argc, argv);
  TString expFilename="expfilename.root";
  TString simFilename="simfilename.root";
  expFile = new TFile(expFilename);
  simFile = new TFile(simFilename);
  cout << "Got files\n";

  TFolder *histos = (TFolder*)expFile->FindObjectAny("histos2D");
  TH2D *h_exp=(TH2D*)histos->FindObjectAny("observables_hist");
  TTree *t_exp=(TTree*)expFile->Get("observables_tree");

  TMinuit minuit(npar);
  minuit.SetFCN(fcn);        
  exp_hist=h_exp;
  exp_tree=t_exp;                       
etc.....

moneta · January 27, 2017, 2:08am

HI,

I am not sure I have fully understood what you mean with unbinned parameter extraction. However , I’ll try to answer your questions:

You don;t need at all to use a TF1. To use Minuit you can use the old interface (TMinuit) or the new one based on the ROOT::Math::Minimizer, where the function to be minimised needs to be passed as a particular function object (functor), but does not need to be a TF1
See as example root.cern.ch/numerical-minimiza … idim_minim

Concerning your code, I don’t see really a numerical dependency or your function to be minimised (FCN) on the parameter. Given a parameter value you are getting an histogram from the file.
To use Minuit you need to have a continuous function on the parameter. This will be a problem.

If you define the TTree as a global variable you don;t need actually to pass it around.
The correct error is obtained multiplying by 2 the log-likelihood function, and you don;t need to to that when minimising a least square function (chi2) because it can be seen as 2 * nll.
However if you are using another function, which statistically is not a all, there is no real meaning of statistical parameter uncertainties.

Best Regards

Lorenzo

mgv4ce · January 30, 2017, 4:17pm

Hi,
By unbinned parameter extraction, I just mean an unbinned fit, getting the parameters. The “function” I am fitting to the data is just an array of histograms, one for each allowed value of parameter.

Here is the problem: my PDF is formed by filling a histogram with weights on phase space. So, for a given parameter, I would do a loop over all events as in the emphasized lines below.

There will eventually be either 4 or 5 parameters, which means looping over 10^8 events many thousands of times becomes infeasible. I want to try to fill my histograms before going into minuit, and just load them from file. Their names will be indexed by the parameter values they were filled using.

Here is the (not optimal) version:

  TTree *sim_tree;
  TH2D *exp_hist;
  TF2 *fparam; 
 
double weight(double* xPtr, double par[]){
  Double_t X    = xPtr[0];
  Double_t Y    = xPtr[1];
  double parm0 = par[0];
  double parm1 =  par[1];
  double f = some function of X,Y,parms;
  return f;
}

double binnedNLL(TH2D* h_meas, TF2 *f){  
  
   TH2D *h_sim=new TH2D("h_sim","h_sim",100,0,1,100,0,1);
   h_sim->Sumw2();
   
  //-----------LOOPING OVER ALL ENTRIES IN CHAIN---------//
  //-----I WANT TO AVOID THIS BECAUSE IT IS VERY LONG--//
  double x,y,var1,var2;
  Long64_t nentries = sim_tree->GetEntries();
  Long64_t nbytes = 0, nb = 0;
  for (Long64_t jentry=0; jentry<nentries; jentry++) {
      nb = sim_tree->GetEntry(jentry);   nbytes += nb;
      
     //values thrown in sim phase space
    x=(double)sim_tree->GetLeaf("X")->GetValue();
    y=(double)sim_tree->GetLeaf("Y")->GetValue();
    //values reconstructed after response
    var1=(double)sim_tree->GetLeaf("var1")->GetValue();
    var2=(double)sim_tree->GetLeaf("var2")->GetValue();
    eventWeight=f->Eval(x,y);
    h_sim->Fill(var1,var2,eventWeight);
    }
    
  double nll=0;
  TAxis *xaxis = h_meas->GetXaxis();
  TAxis *yaxis = h_meas->GetYaxis();
  Double_t X,Y;
  for (int i=1; i<=h_meas->GetNbinsX(); i++)for (int j=1; j<=h_meas->GetNbinsY(); j++){
      X=xaxis->GetBinCenter(i);
      Y=yaxis->GetBinCenter(j);
      int n=(int)(h_meas->GetBinContent(i,j));
      double mu;
      mu=(double)h_sim->Interpolate(X,Y);
      if (mu<1e-10) mu=1e-10;    // avoid log(0) problems!
      nll += n*TMath::Log(mu) + mu + TMath::LnGamma(n+1);
    }
  return -2*nll;   // factor of -2 so minuit gets the errors right
}

void fcn(int& npar, double* deriv, double& f, double par[], int flag){
  for (int i=0; i<npar; i++){
    fparam->SetParameter(i,par[i]);
  }
  f = calcNLL(h_exp,fparam);
} // end of fcn

int main(int argc, char **argv) {
  TApplication theApp("App", &argc, argv);
  TString expFilename="expfilename.root";
  TString simFilename="simfilename.root";
  expFile = new TFile(expFilename);
  simFile = new TFile(simFilename);
  cout << "Got files\n";

  TFolder *histos = (TFolder*)expFile->FindObjectAny("histos2D");
  TH2D *h_exp=(TH2D*)histos->FindObjectAny("observables_hist");
  TTree *t_sim=(TTree*)simFile->Get("observables_tree");

  simChain->SetBranchAddress("x",&x);
  simChain->SetBranchAddress("y",&y);
  simChain->SetBranchAddress("var1",&var1);
  simChain->SetBranchAddress("var2",&var2);

  TF2* myfunc = new TF2("myfunc", weight, 0, 1, 0, 1, npar);
  fparam=myfunc;
  
  TMinuit minuit(npar);
  minuit.SetFCN(fcn);        
  exp_hist=h_exp;
  sim_tree=t_sim
  
  etc.

Thank you for your other answers as well!

mgv4ce · February 1, 2017, 7:19pm

I see now that I need a continuous function, not an array. I found this in a forum
"Minuit will automatically find the optimal value for the parameter step according to the calculated gradient and Hessian"

I had thought that I could dictate the step size to Minuit. Obviously if the step size is varied, then my method will not work, because the algorithm will try to access an histogram that I have not made.

Not seeing a solution to my problem immediately. Perhaps I will just do a simple scan…

Many Thanks!