Discrepancy between the using TTree.Project method and RDataFrame when filling a histogram with weights

El_Hanzo · July 4, 2023, 6:12pm

Hello,

I am working on a ROOT project and tried to move the old code to using RDataFrames. Now, i have encountered a discrepancy between the old ROOT method and the RDataFrame method when filling a histogram with weights. Here’s the minimal code to reproduce the issue (without data):

ChannelFlag = '(ejets_2018_70 || mujets_2018_70)'
NormFactor = c.norm_factor
FullWeight = "weight_mc*weight_pileup*weight_leptonSF*weight_jvt*"+str(NormFactor)

# Old ROOT method
fT = ROOT.TChain(Tree)
for i, entry in enumerate(InputFiles):
    fT.Add(entry)
canvas = ROOT.TCanvas()
h_tmp = ROOT.TH1D("test", "test", 25, 81000, 101000)
h_tmp.Sumw2()
fT.Project(h_tmp.GetName(), "m_ll", ChannelFlag+"*"+str(FullWeight))
h_tmp.Draw()
canvas.Draw()

# RDataFrame method
rdf = ROOT.RDataFrame(Tree, InputFiles)
rdf = rdf.Filter(ChannelFlag)
rdf = rdf.Define("weight", FullWeight)
histo = rdf.Histo1D(ROOT.RDF.TH1DModel("test", "test", 25, 81000, 101000), "m_ll", "weight")
canvas = ROOT.TCanvas()
histo.Draw() 
canvas.Draw()

Both methods get the same input files and use the same filter (ChannelFlag). They only differ in the functions used. However, for the old method, the histogram legend shows 79759 entries. For the RDataFrame method, the histogram legend shows 80299. I also checked the Root Files and they have 82282 events at total.

In the old method, I found that if I remove weight_pileup from FullWeight, the histogram shows 80299 entries, matching the RDataFrame method. Then i checked the values in weight_pileup for nans or very small values, with

# Filtering events with extremely small weights
elist_small_weights = ROOT.TEventList("elist_small_weights")
fT.Draw(">>elist_small_weights", "weight_pileup < 1E-6")  
small_weights = elist_small_weights.GetN()
print("Number of events with small weight_pileup: ", small_weights)

The output shows 511 events, and my guess is now, that these small values seem to be the cause of the discrepancy between the old ROOT method and the RDataFrame method.

I would appreciate any insights or suggestions regarding why this discrepancy arises and how to handle such small-weight events correctly.

Thanks!

jalopezg · July 4, 2023, 8:17pm

Hi @El_Hanzo,

First of all, welcome to the ROOT forum! In order to reproduce (and investigate) the issue, is there any chance that we can get a copy of the input files?

Other than that, I’m inviting our RDataFrame experts, @vpadulan and @eguiraud. Perhaps they have some ideas.

Cheers,
J.

eguiraud · July 4, 2023, 8:22pm

Random guess that would have to be verified: TTree::Project might skip the histogram Fill call entirely if the weight “is 0”, while for RDF if the Filter passes there is a Fill call (sometimes with a weight that is practically 0).

@El_Hanzo you can make the weights artificially larger and see if in that case RDF and TTree::Project agree.

Cheers,
Enrico

El_Hanzo · July 7, 2023, 6:47am

Thanks for your replies. The missing values are indeed zeros, and using rdf.Filter now produces the same results as the non-rdataframe approach. Thanks !

system · July 21, 2023, 6:48am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.