RooDataHist and variable bins

Hello experts,

I’m having trouble with importing TH1D histograms with variable binning into a RooDataHist and then performing a fit and plotting the result. This issue has been brought up in the past (here, last comment in 2019) and the result seemed inconclusive. There were some suggestions of making the RooDataHist with the Import(TH1 *, kFALSE) option and the plotOn() using DataError(RooAbsData::SumW2) option. Here is my root code demonstrating the setup.

    //Prepare file to fit to
    TFile * f1 = new TFile("SigRho_10-90FullExtractedYieldMap.root");

    TH2F * sigrhoYieldHist = (TH2F * ) f1 -> Get("yieldHist");
    TH1D * sigrho1DYieldHist = sigrhoYieldHist -> ProjectionY("sigrho1DYieldHist", 1, 10);
    double TotalSize = sigrho1DYieldHist -> GetEntries();

    RooRealVar mpipi0("InvM_pipi0", "M_{#pi#pi^{0}}", 0.25, 3.5);

    RooDataHist * dataHist = new RooDataHist("dataHist", "dataHist", mpipi0, Import(* sigrho1DYieldHist, kFALSE));

    //Prepare PDF to fit
    TFile * f2 = new TFile("Sig_Mpipi0WeightedPDF.root");
    TFile * f3 = new TFile("Rho_Mpipi0WeightedPDF.root");

    RooWorkspace * sigWorkspace = (RooWorkspace *) f2 -> Get("myWorkspace");
    RooWorkspace * rhoWorkspace = (RooWorkspace *) f3 -> Get("myWorkspace");
    RooAbsPdf * sigMpipi0Pdf = sigWorkspace -> pdf("sigPDF");
    RooAbsPdf * rhoMpipi0Pdf = rhoWorkspace -> pdf("rhoPDF");

    double sigMean = sigFrac*TotalSize*0.01;
    double rhoMean = rhoFrac*TotalSize*0.01;

    RooRealVar nSig("nSig", "Number of signal events", sigMean, 0.9*sigMean, 1.1*sigMean);
    RooRealVar nRho("nRho", "Number of D rho events", rhoMean, 0.9*rhoMean, 1.1*rhoMean);

    RooExtendPdf * sigExtended = new RooExtendPdf("sigExtended", "sigExtended", * sigMpipi0Pdf, nSig);
    RooExtendPdf * rhoExtended = new RooExtendPdf("rhoExtended", "rhoExtended", * rhoMpipi0Pdf, nRho);
    RooAddPdf * totalPdf = new RooAddPdf("totalPdf", "totalPdf", RooArgSet(* sigExtended, * rhoExtended), RooArgSet(nSig, nRho));

    //Do the fit
    RooFitResult * fitRes = totalPdf -> fitTo( * dataHist, Extended(kTRUE), SumW2Error(kTRUE), NumCPU(16), PrintLevel(1), Save());

    //Plot the full result
    RooPlot * range = mpipi0.frame(Name("range"), Title("Mixed Sample"));
    dataHist -> plotOn(range, MarkerColor(kBlack), Name("data"), DataError(RooAbsData::SumW2));
    totalPdf -> plotOn(range, Components("sigPDF"), LineColor(kRed), Name("D #pi #pi^{0}"));
    totalPdf -> plotOn(range, Components("rhoPDF"), LineColor(kGreen), Name("D #rho"));    
    totalPdf -> plotOn(range, LineColor(kBlue), Name("model"));
    range -> Draw();

The fitter doesn’t complain and does converge on a value for nSig and nRho, but the resulting plot is clearly not correct and exhibits some weird plotting behaviour around 0.75:


Have I done something wrong or is there some known workaround for this?

Thanks,
Kim

ROOT Version: 6.24
Platform: Red Hat Enterprise Linux 7.9
Compiler: GCC 10.2.0

Hi @khsmith ,
sorry for the high latency!
We need @moneta 's or @jonas 's help here, let’s ping them.

Cheers,
Enrico

Hi @khsmith, and thanks for bringing this up!

I know the documentation is confusing, but fortunately things are not so complicated in the end. You just have to keep in mind three things:

  1. When plotting a TH1, the bin content is not divided by bin volume
  2. When plotting a RooDataHist, the bin content is divided by bin volume to plot the corresponding probability density
  3. When importing a TH1 to RooDataHist with RooFit::Import, you can specify that the TH1 content should already be interpreted as a probability density.

So you do everything right :smiley: Your histogram represents yields, not probability densities, so indeed using Import(...,false) is correct :+1:

There problem is that there is a bug in RooFit that the bin width correction is not done correctly when plotting with SumW2 errors! I have opened an issue to fix this, I hope to get this into the next release and maybe also a reduced version of this fix as a backport to existing ROOT releases:

Can you live with Poisson errors for now, or do we need to find a workaround until the fix is merged?

Cheers,
Jonas

PS: here is example where I do it correctly, with a little cross check: after fitting, I generate a toy dataset from the fit model and plot it together with the original data (yellow vs. black dots). If things are correct, the datasets will be statistically compatible. If you use DataError(RooAbsData::SumW2), this will break until the fix is merged.

std::vector<double> makeBinEdges() {
  std::vector<double> binEdges;
  for (std::size_t i = 0; i < 10; ++i) {
    binEdges.push_back(1 - ((1 - 0.25) / 10) * (10 - i));
  }
  for (std::size_t i = 0; i < 11; ++i) {
    binEdges.push_back(3.5 - ((3.5 - 1.0) / 10) * (10 - i));
  }
  return binEdges;
}

TH1D* createHisto(std::vector<double> const& binEdges) {
  auto histo = new TH1D{"histo",
                        "histo",
                        static_cast<int>(binEdges.size() - 1),
                        binEdges.data()};
  for (std::size_t i = 0; i < 1000; ++i) {
    histo->Fill(gRandom->Gaus(2.0, 1.0));
  }
  for (std::size_t i = 0; i < 1000; ++i) {
    histo->Fill(gRandom->Gaus(0.75, 0.15));
  }
  return histo;
}

using namespace RooFit;

// Create the histogram with non-uniform bins
std::vector<double> binEdges = makeBinEdges();
TH1D* histo = createHisto(binEdges);

// Create the model
RooRealVar x{"x", "x", 1.0, binEdges[0], binEdges.back()};
RooRealVar sigMean{
    "sigMean", "sigMean", 0.75, binEdges[0], binEdges.back()};
RooRealVar sigSigma{"sigSigma", "sigSigma", 0.15, 0.01, 10.0};
RooRealVar bkgMean{
    "bkgMean", "bkgMean", 0.75, binEdges[0], binEdges.back()};
RooRealVar bkgSigma{"bkgSigma", "bkgSigma", 0.15, 0.01, 10.0};

RooRealVar frac{"frac", "frac", 0.5, 0.0, 1.0};

RooGaussian sigGaus{"sigGaus", "sigGaus", x, sigMean, sigSigma};
RooGaussian bkgGaus{"bkgGaus", "bkgGaus", x, bkgMean, bkgSigma};

RooAddPdf model{"model", "model", {sigGaus, bkgGaus}, {frac}};

// Import the histogram to a RooDataHist, where we don't interpred the
// histogram content as densities.
auto* dataHist =
    new RooDataHist("dataHist", "dataHist", x, Import(*histo, false));

//Do the fit
RooFitResult* fitRes = model.fitTo(*dataHist, Save(), PrintLevel(-1));

// Generate a toy dataset from the fitted model to compare, veifying that
// the fit to the non-uniform binned dataset was corret.
auto* dataHistToy = model.generate(x, dataHist->sumEntries());

//Plot the full result
TCanvas c{"c", "c"};
RooPlot* frame = x.frame();
dataHist->plotOn(frame, MarkerColor(kBlack));
dataHistToy->plotOn(frame, MarkerColor(kYellow));
model.plotOn(frame, Components("bkgGaus"), LineColor(kRed));
model.plotOn(frame, Components("sigGaus"), LineColor(kGreen));
model.plotOn(frame, LineColor(kBlue));
frame->Draw();
c.SaveAs("plot.png")

Thanks for explaining this Jonas. Is there a way of specifying my own custom errors?

Cheers,
Kim

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.