Memory leak with RDataFrame in Python

Hi,
that script is a bit weird because as it loops it keeps adding more cuts, while I think the original intention was to use a different cut per iteration…?

Anyway with ROOT v6.22 this should help: instead of

	cuts = {}
	for i in range(0,100):
		cuts['lePt%s'%i] = 'lep_0_p4.Pt()>%s'%i
		hist = loopDataFrame( treeName, file, cuts )
		hist.Draw()

book all computations first, use the results second:

    cuts = {}
    histos = []
	for i in range(0,100):
		cuts['lePt%s'%i] = 'lep_0_p4.Pt()>%s'%i
		hist = loopDataFrame( treeName, file, cuts )
		histos.append(hist)
   for h in histos:
      h.GetValue() # or Draw, or whatever

For the multiproc solution, the idea is to run each loopDataFrame invocation in a different subrocess using e.g. a process pool, so when the processing of a dataframe is done the related worker process is killed and the memory allocated by the interpreter is freed with it.

These might be good suggestions for your usecase or not, I’d need to see a reproducer to check what exactly is hogging memory in your case.

Hope this helps!
Enrico