Bin content of pdf and generated data

Hello everyone,
I’m trying to understand how the pull in my plot is calculated, and to do so I wanted to doublecheck the content of each bin in my pseudo-data and PDF. However, when I try to do that as shown below, I don’t get the correct numbers for the pdf bin contents. I believe this has to do with the normalization of the pdf, but how? Does that also affect how the pull is calculated?

pdf_model = R.RooAddPdf('pdf_model','pdf_model', R.RooArgList(*pdf_list), R.RooArgList(*N_list))
pseudo_data = pdf_model.generate(vars, RF.Extended()) 
histo = pseudo_data.createHistogram(var.GetName())  # Create histogram from pseudo_data
histo2 = pdf_model.createHistogram(var.GetName()) 
    for bin_idx in range(1, histo.GetNbinsX() + 1):  # Loop over bins
        bin_center = histo.GetXaxis().GetBinCenter(bin_idx)
        bin_content = histo.GetBinContent(bin_idx)
        bin_content2 = histo2.GetBinContent(bin_idx)
        print(f"Bin {bin_idx}: Center = {bin_center}, Entries = {bin_content}, {bin_content2}")

Result:

Bin 1: Center = 0.125, Entries = 0.0, 0.0
Bin 2: Center = 0.375, Entries = 0.0, 0.0
Bin 3: Center = 0.625, Entries = 323.0, 2914.97314453125
Bin 4: Center = 0.875, Entries = 7354.0, 76889.78125
Bin 5: Center = 1.125, Entries = 60390.0, 534624.625
Bin 6: Center = 1.375, Entries = 311340.0, 2018931.625
Bin 7: Center = 1.625, Entries = 1451967.0, 5426551.0
Bin 8: Center = 1.875, Entries = 1289999.0, 4718508.0
Bin 9: Center = 2.125, Entries = 201447.0, 713583.4375
Bin 10: Center = 2.375, Entries = 135487.0, 479749.15625
Bin 11: Center = 2.625, Entries = 118369.0, 416598.34375
Bin 12: Center = 2.875, Entries = 105108.0, 374604.78125
Bin 13: Center = 3.125, Entries = 86652.0, 335661.75
Bin 14: Center = 3.375, Entries = 71263.0, 269663.125
Bin 15: Center = 3.625, Entries = 58995.0, 227417.1875
Bin 16: Center = 3.875, Entries = 54277.0, 205207.03125
Bin 17: Center = 4.125, Entries = 65116.0, 244745.5
Bin 18: Center = 4.375, Entries = 98994.0, 348828.34375
Bin 19: Center = 4.625, Entries = 26639.0, 163213.703125
Bin 20: Center = 4.875, Entries = 3149.0, 11285.59375
Bin 21: Center = 5.125, Entries = 1340.0, 3843.0673828125
Bin 22: Center = 5.375, Entries = 318.0, 1708.4215087890625
Bin 23: Center = 5.625, Entries = 120.0, 1400.99560546875
Bin 24: Center = 5.875, Entries = 58.0, 751.53515625
Bin 25: Center = 6.125, Entries = 19.0, 501.0234375
Bin 26: Center = 6.375, Entries = 21.0, 375.767578125
Bin 27: Center = 6.625, Entries = 14.0, 125.255859375
Bin 28: Center = 6.875, Entries = 17.0, 125.255859375

Hi @mdgalati,

I guess the pulls look correct, but the yields in the histogram from the pdf miss a multiplication by the bin widths to get the correct yields. According tho the documentation of createHistogram(), it should be possible to do this correction with the Scale(true) keyword argument, but for whatever reason this scaling is hardcoded to false for extended pdfs inside RooFit.

I suggest to fix this for the next patch release, but I’m afraid with the current RooFit it’s not possible to get the yields for such an extended pdf into a TH1 with just a call to createHistogram(). You’ll have to manually multiply with the bin volumes.

I hope this helps!
Jonas

1 Like

Thanks for your answer!
I’ve multiplied by the bin width (0.25), by I still get mismatching numbers from the ones in the plot, what am I doing wrong?

 print(f"Bin {bin_idx}: Center = {bin_center}, Entries = {bin_content}, {round(bin_content2*bin_width_var)}")

I get the same result if I Scale histo2

Bin 1: Center = 0.125, Entries = 0.0, 0
Bin 2: Center = 0.375, Entries = 0.0, 0
Bin 3: Center = 0.625, Entries = 323.0, 727
Bin 4: Center = 0.875, Entries = 7354.0, 19189
Bin 5: Center = 1.125, Entries = 60390.0, 133738
Bin 6: Center = 1.375, Entries = 311340.0, 504458
Bin 7: Center = 1.625, Entries = 1451967.0, 1356506
Bin 8: Center = 1.875, Entries = 1289999.0, 1179177
Bin 9: Center = 2.125, Entries = 201447.0, 178208
Bin 10: Center = 2.375, Entries = 135487.0, 119806
Bin 11: Center = 2.625, Entries = 118369.0, 104470
Bin 12: Center = 2.875, Entries = 105108.0, 94334
Bin 13: Center = 3.125, Entries = 86652.0, 84229
Bin 14: Center = 3.375, Entries = 71263.0, 68559
Bin 15: Center = 3.625, Entries = 58995.0, 56633
Bin 16: Center = 3.875, Entries = 54277.0, 51257
Bin 17: Center = 4.125, Entries = 65116.0, 60453
Bin 18: Center = 4.375, Entries = 98994.0, 87320
Bin 19: Center = 4.625, Entries = 26639.0, 40173
Bin 20: Center = 4.875, Entries = 3149.0, 2932
Bin 21: Center = 5.125, Entries = 1340.0, 1039
Bin 22: Center = 5.375, Entries = 318.0, 424
Bin 23: Center = 5.625, Entries = 120.0, 350
Bin 24: Center = 5.875, Entries = 58.0, 188
Bin 25: Center = 6.125, Entries = 19.0, 125
Bin 26: Center = 6.375, Entries = 21.0, 94
Bin 27: Center = 6.625, Entries = 14.0, 31
Bin 28: Center = 6.875, Entries = 17.0, 31

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.