Memory leak with RDataFrame in Python

rooter_03 · January 12, 2021, 10:11am

Hi,

Ok so I can either use multiproc or something else related with “jitting” that I do not really understand or want to spare the time to understand, given that I am already pretty busy with my job. Regarding the reproducer, I could provide it to you, but It would take me too much time to extract the exact lines needed. However the relevant lines look pretty close to what you can find in:

i.e.:

import ROOT
import os
import time

@profile
def loopDataFrame(treeName, file, cuts):
	print "Processing %s"%file
	print "Processing %s"%cuts
	df = ROOT.ROOT.RDataFrame(treeName, file)
	for cutName, cutDef in cuts.iteritems():
		df = df.Filter(cutDef)
	model = ROOT.RDF.TH1DModel("lep_0_p4.Pt()", ";p_{T} (lep_{0}) GeV;", 100, 0., 100.)
	myHisto = df.Define("myP4", "lep_0_p4.Pt()").Histo1D(model, "myP4")
	return myHisto

@profile
def main():
	file = 'merged.root'
	treeName = 'NOMINAL'

	ROOT.ROOT.EnableImplicitMT()

	cuts = {}
	for i in range(0,100):
		cuts['lePt%s'%i] = 'lep_0_p4.Pt()>%s'%i
		hist = loopDataFrame( treeName, file, cuts )
		hist.Draw()
		time.sleep(1)

	raw_input("Press Enter to continue...")

if __name__ == '__main__': main()
#EOF

Could you please add your second solution, and ideally also the first one to this script? I think that would be pretty useful to anyone seeing this in the future.

Cheers.