Hello,
I am working on a ROOT project and tried to move the old code to using RDataFrames. Now, i have encountered a discrepancy between the old ROOT method and the RDataFrame method when filling a histogram with weights. Here’s the minimal code to reproduce the issue (without data):
ChannelFlag = '(ejets_2018_70 || mujets_2018_70)'
NormFactor = c.norm_factor
FullWeight = "weight_mc*weight_pileup*weight_leptonSF*weight_jvt*"+str(NormFactor)
# Old ROOT method
fT = ROOT.TChain(Tree)
for i, entry in enumerate(InputFiles):
fT.Add(entry)
canvas = ROOT.TCanvas()
h_tmp = ROOT.TH1D("test", "test", 25, 81000, 101000)
h_tmp.Sumw2()
fT.Project(h_tmp.GetName(), "m_ll", ChannelFlag+"*"+str(FullWeight))
h_tmp.Draw()
canvas.Draw()
# RDataFrame method
rdf = ROOT.RDataFrame(Tree, InputFiles)
rdf = rdf.Filter(ChannelFlag)
rdf = rdf.Define("weight", FullWeight)
histo = rdf.Histo1D(ROOT.RDF.TH1DModel("test", "test", 25, 81000, 101000), "m_ll", "weight")
canvas = ROOT.TCanvas()
histo.Draw()
canvas.Draw()
Both methods get the same input files and use the same filter (ChannelFlag). They only differ in the functions used. However, for the old method, the histogram legend shows 79759 entries. For the RDataFrame method, the histogram legend shows 80299. I also checked the Root Files and they have 82282 events at total.
In the old method, I found that if I remove weight_pileup
from FullWeight, the histogram shows 80299 entries, matching the RDataFrame method. Then i checked the values in weight_pileup for nans or very small values, with
# Filtering events with extremely small weights
elist_small_weights = ROOT.TEventList("elist_small_weights")
fT.Draw(">>elist_small_weights", "weight_pileup < 1E-6")
small_weights = elist_small_weights.GetN()
print("Number of events with small weight_pileup: ", small_weights)
The output shows 511 events, and my guess is now, that these small values seem to be the cause of the discrepancy between the old ROOT method and the RDataFrame method.
I would appreciate any insights or suggestions regarding why this discrepancy arises and how to handle such small-weight events correctly.
Thanks!