Plot normalization

johnpaul · October 5, 2024, 5:51pm

Dear RooFit experts,

I have a variable bin histogram that I have fit to a PDF. When I plot it, the bin contents are divided by the bin volume, and then scaled to the average bin size in the range. For instance, if I had the bin boundaries [0,1,3,7,11], the y-axis would read something like “events / 2.75”.

This is all expected, of course. My question is whether there is a simple way to change how the bin contents are normalized (to something other than the average bin width) without changing the bin boundaries or the contents of the RooDataHist? I know how to scale PDFs, but I’m wary of manipulating the RooDataHist, and I feel like there should be something simpler than editing the contents by hand.

Thanks,

John Paul

Danilo · October 6, 2024, 7:59am

Hi John Paul,

Thanks for this question.
I add in the loop @jonas , who may help. Perhaps could you share why you want to change the way in which normalisation happens in RooFit? What are you trying to achieve?
The fast and reliable normalisations are actually one of the pillars of RooFit.

Cheers,
D

johnpaul · October 6, 2024, 11:00am

Hi Danilo,

It’s mostly aesthetic. The choice of the bin size is dictated by the physics (in my case, the resolution of the resonance). But this means that, unless I happen to get lucky, the default division is some awkward number. I would like to make it so that the normalization is, say, events / 1 or events / 10 or something where I can look at the plot and do the math in my head a bit faster. Also, it’s prettier to see “/ 1” instead of “/ 0.163636”.

Best,

John Paul

jonas · October 6, 2024, 11:54am

Hi @johnpaul,

if you plot a RooDataHist, there is no normalization going on. It’s just data, and it will plot in each bin the number of Events.

I would argue that the average bin width in the y-axis title is quite useless information. You can read the number of Events on the y axis, and the bin width on the x axis. What do you need it for?

Cheers,
Jonas

johnpaul · October 6, 2024, 12:42pm

Hi Jonas,

When I plot a RooDataHist with variable binning, there is definitely a normalization. It divides the bin content by the bin width and then multiplies by the average bin width of the histogram. I know this because I can reproduce the bin contents exactly from the original TH1, which has just the number of entries per bin.

Just to illustrate this concretely, here’s a simple script that reproduces what I mean.

import ROOT
from array import array

# make variable bin TH1
edges = [0, 1, 3, 6, 10]
hist = ROOT.TH1D("hist","hist",len(edges)-1, array('d', edges))
hist.Fill(0.5)
hist.Fill(2.5)
hist.Fill(5.0)
hist.Fill(8.0)

# make RooDataHist out of TH1
x = ROOT.RooRealVar("x","x",0,10)
dh = ROOT.RooDataHist("dh","dh",ROOT.RooArgList(x),hist)

# plot it
can = ROOT.TCanvas("c","c", 500,500)
can.cd()
plot = x.frame()
dh.plotOn(plot)
plot.Draw()
can.Update()
can.Draw()
can.SaveAs("dummy.pdf")

In the plot, the histogram bin values are normalized so that it provides the # of events / 2.5. All of this is fine; I just want to be able to change 2.5 to something different without going through and re-calculating the histogram contents and error bars by hand.

Best,

John Paul

StephanH · October 7, 2024, 7:45am

Hello @johnpaul,

I looked a bit in RooFit’s code, and I believe there is only two options: scale to density or don’t. Given that RooFit is made to plot PDFs, that is densities, the data must be plotted as a density as well. So RooFit must do this kind of scaling, otherwise, the data would never match the PDFs (or the PDFs would jump up and down at the bin edges).

There might be ways to switch off the scaling if we investigate RooFit documentation, but I don’t think there’s a way to obtain other scale factors, because none of these would work when you plot PDFs on the plot as the data.

johnpaul · October 7, 2024, 7:36pm

Hi Stephan,

I understand the need to scale to densities. That’s fine. I just want to change the overall scale of the density. This value is arbitrary (is there any principled reason why it must be the average bin size?) and doesn’t affect the relationship between PDFs and data.

In fact, neither the PDF nor the data need to change. The only thing that needs to change is the scale of the y-axis (and the associated label); everything else is constant. So, in principle, it’s really a plotting option more than anything else. You’re just relabeling the y-axis and adjusting the location of the tick marks.

Best,

John Paul

StephanH · October 8, 2024, 3:36pm

Hi John Paul,

I understand that you want to make the plot look nicer, it’s just that RooFit’s authors didn’t foresee the case. I guess they chose what makes most sense to have PDFs = densities and data = counts line up on the same plot.

Maybe you could “fix” the plot by tinkering with the axis directly. You can definitely get your hands on the axis here.

system · October 22, 2024, 3:37pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.