RooDataHist and variable bins

khsmith · February 23, 2022, 5:21am

Hello experts,

I’m having trouble with importing TH1D histograms with variable binning into a RooDataHist and then performing a fit and plotting the result. This issue has been brought up in the past (here, last comment in 2019) and the result seemed inconclusive. There were some suggestions of making the RooDataHist with the Import(TH1 *, kFALSE) option and the plotOn() using DataError(RooAbsData::SumW2) option. Here is my root code demonstrating the setup.

    //Prepare file to fit to
    TFile * f1 = new TFile("SigRho_10-90FullExtractedYieldMap.root");

    TH2F * sigrhoYieldHist = (TH2F * ) f1 -> Get("yieldHist");
    TH1D * sigrho1DYieldHist = sigrhoYieldHist -> ProjectionY("sigrho1DYieldHist", 1, 10);
    double TotalSize = sigrho1DYieldHist -> GetEntries();

    RooRealVar mpipi0("InvM_pipi0", "M_{#pi#pi^{0}}", 0.25, 3.5);

    RooDataHist * dataHist = new RooDataHist("dataHist", "dataHist", mpipi0, Import(* sigrho1DYieldHist, kFALSE));

    //Prepare PDF to fit
    TFile * f2 = new TFile("Sig_Mpipi0WeightedPDF.root");
    TFile * f3 = new TFile("Rho_Mpipi0WeightedPDF.root");

    RooWorkspace * sigWorkspace = (RooWorkspace *) f2 -> Get("myWorkspace");
    RooWorkspace * rhoWorkspace = (RooWorkspace *) f3 -> Get("myWorkspace");
    RooAbsPdf * sigMpipi0Pdf = sigWorkspace -> pdf("sigPDF");
    RooAbsPdf * rhoMpipi0Pdf = rhoWorkspace -> pdf("rhoPDF");

    double sigMean = sigFrac*TotalSize*0.01;
    double rhoMean = rhoFrac*TotalSize*0.01;

    RooRealVar nSig("nSig", "Number of signal events", sigMean, 0.9*sigMean, 1.1*sigMean);
    RooRealVar nRho("nRho", "Number of D rho events", rhoMean, 0.9*rhoMean, 1.1*rhoMean);

    RooExtendPdf * sigExtended = new RooExtendPdf("sigExtended", "sigExtended", * sigMpipi0Pdf, nSig);
    RooExtendPdf * rhoExtended = new RooExtendPdf("rhoExtended", "rhoExtended", * rhoMpipi0Pdf, nRho);
    RooAddPdf * totalPdf = new RooAddPdf("totalPdf", "totalPdf", RooArgSet(* sigExtended, * rhoExtended), RooArgSet(nSig, nRho));

    //Do the fit
    RooFitResult * fitRes = totalPdf -> fitTo( * dataHist, Extended(kTRUE), SumW2Error(kTRUE), NumCPU(16), PrintLevel(1), Save());

    //Plot the full result
    RooPlot * range = mpipi0.frame(Name("range"), Title("Mixed Sample"));
    dataHist -> plotOn(range, MarkerColor(kBlack), Name("data"), DataError(RooAbsData::SumW2));
    totalPdf -> plotOn(range, Components("sigPDF"), LineColor(kRed), Name("D #pi #pi^{0}"));
    totalPdf -> plotOn(range, Components("rhoPDF"), LineColor(kGreen), Name("D #rho"));    
    totalPdf -> plotOn(range, LineColor(kBlue), Name("model"));
    range -> Draw();

The fitter doesn’t complain and does converge on a value for nSig and nRho, but the resulting plot is clearly not correct and exhibits some weird plotting behaviour around 0.75:

Have I done something wrong or is there some known workaround for this?

Thanks,
Kim

ROOT Version: 6.24
Platform: Red Hat Enterprise Linux 7.9
Compiler: GCC 10.2.0

eguiraud · March 3, 2022, 10:49am

Hi @khsmith ,
sorry for the high latency!
We need @moneta 's or @jonas 's help here, let’s ping them.

Cheers,
Enrico

jonas · March 8, 2022, 1:32am

Hi @khsmith, and thanks for bringing this up!

I know the documentation is confusing, but fortunately things are not so complicated in the end. You just have to keep in mind three things:

When plotting a TH1, the bin content is not divided by bin volume
When plotting a RooDataHist, the bin content is divided by bin volume to plot the corresponding probability density
When importing a TH1 to RooDataHist with RooFit::Import, you can specify that the TH1 content should already be interpreted as a probability density.

So you do everything right Your histogram represents yields, not probability densities, so indeed using Import(...,false) is correct

There problem is that there is a bug in RooFit that the bin width correction is not done correctly when plotting with SumW2 errors! I have opened an issue to fix this, I hope to get this into the next release and maybe also a reduced version of this fix as a backport to existing ROOT releases:

github.com/root-project/root

[RF] Consistently do bin width correction when creating a RooHist from a TH1

root-project:master ← guitargeek:roodatahist_issue_1

opened 01:21AM - 08 Mar 22 UTC

guitargeek

+88 -101

The RooPlot constructor from TH1 had an option to disable the bin width correct…ion, but this argument was not even used when the RooHist was created with Poisson errors or asymmetrical errors, (Poisson errors are the default case). The Bin width correction was always done in this case, no matter what the `correctForBinWith` argument was. However, when SumW2 errors are used, the `correctForBinWith` parameter was actually used! This caused inconsitent behavior, because when plotting a `RooAbsData` the `correctForBinWith` parameter is `false` by default, meaning the bin width correction is done for Poisson errors but not for SumW2 errors! This commit fixes this behavior by following the precedent set by the more common Poisson error: the `correctForBinWith` parameter is consistently ignored, and an exception is thrown when it's false. The correct way to disable the bin width correction is to enable the interpretation as a density when importing a TH1 to a RooDataHist. This commit fixes a bug reported on the forum.

Can you live with Poisson errors for now, or do we need to find a workaround until the fix is merged?

Cheers,
Jonas

PS: here is example where I do it correctly, with a little cross check: after fitting, I generate a toy dataset from the fit model and plot it together with the original data (yellow vs. black dots). If things are correct, the datasets will be statistically compatible. If you use DataError(RooAbsData::SumW2), this will break until the fix is merged.

std::vector<double> makeBinEdges() {
  std::vector<double> binEdges;
  for (std::size_t i = 0; i < 10; ++i) {
    binEdges.push_back(1 - ((1 - 0.25) / 10) * (10 - i));
  }
  for (std::size_t i = 0; i < 11; ++i) {
    binEdges.push_back(3.5 - ((3.5 - 1.0) / 10) * (10 - i));
  }
  return binEdges;
}

TH1D* createHisto(std::vector<double> const& binEdges) {
  auto histo = new TH1D{"histo",
                        "histo",
                        static_cast<int>(binEdges.size() - 1),
                        binEdges.data()};
  for (std::size_t i = 0; i < 1000; ++i) {
    histo->Fill(gRandom->Gaus(2.0, 1.0));
  }
  for (std::size_t i = 0; i < 1000; ++i) {
    histo->Fill(gRandom->Gaus(0.75, 0.15));
  }
  return histo;
}

using namespace RooFit;

// Create the histogram with non-uniform bins
std::vector<double> binEdges = makeBinEdges();
TH1D* histo = createHisto(binEdges);

// Create the model
RooRealVar x{"x", "x", 1.0, binEdges[0], binEdges.back()};
RooRealVar sigMean{
    "sigMean", "sigMean", 0.75, binEdges[0], binEdges.back()};
RooRealVar sigSigma{"sigSigma", "sigSigma", 0.15, 0.01, 10.0};
RooRealVar bkgMean{
    "bkgMean", "bkgMean", 0.75, binEdges[0], binEdges.back()};
RooRealVar bkgSigma{"bkgSigma", "bkgSigma", 0.15, 0.01, 10.0};

RooRealVar frac{"frac", "frac", 0.5, 0.0, 1.0};

RooGaussian sigGaus{"sigGaus", "sigGaus", x, sigMean, sigSigma};
RooGaussian bkgGaus{"bkgGaus", "bkgGaus", x, bkgMean, bkgSigma};

RooAddPdf model{"model", "model", {sigGaus, bkgGaus}, {frac}};

// Import the histogram to a RooDataHist, where we don't interpred the
// histogram content as densities.
auto* dataHist =
    new RooDataHist("dataHist", "dataHist", x, Import(*histo, false));

//Do the fit
RooFitResult* fitRes = model.fitTo(*dataHist, Save(), PrintLevel(-1));

// Generate a toy dataset from the fitted model to compare, veifying that
// the fit to the non-uniform binned dataset was corret.
auto* dataHistToy = model.generate(x, dataHist->sumEntries());

//Plot the full result
TCanvas c{"c", "c"};
RooPlot* frame = x.frame();
dataHist->plotOn(frame, MarkerColor(kBlack));
dataHistToy->plotOn(frame, MarkerColor(kYellow));
model.plotOn(frame, Components("bkgGaus"), LineColor(kRed));
model.plotOn(frame, Components("sigGaus"), LineColor(kGreen));
model.plotOn(frame, LineColor(kBlue));
frame->Draw();
c.SaveAs("plot.png")

khsmith · March 10, 2022, 2:51am

Thanks for explaining this Jonas. Is there a way of specifying my own custom errors?

Cheers,
Kim

system · March 24, 2022, 2:51am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.