Possible memory leakage

Athul_Dev · April 17, 2023, 8:29am

_ROOT Version:_6.26/04

Dear root experts,
Im trying to check if the events in my root file pass my emulated triggers. There are about a million events and the size is ~80mb. The script terminates automatically after 400,000 events. On further analyis of the problem, the memory required by the script increased drastically with each loop.
PFA the memory profile of my script here: memory_profile.txt (31.5 KB) (The memory profile is for a very similar script, but one can see the point).
The root files I used are available here: ROOT files

Is there any problem with my root file, or is it a problem with my script itself ?

Thanks for the help,
Regards,
Athul

#!/usr/bin/env python3

import ROOT
import csv
import numpy as np

def pt_cond1(tree,branch,index,i,pt1,pt2):
    ''' This returns True or False for a particular hltpt tau if it can be paired up 
    with the second tau and pass the pt cut'''
    # Check if both the taus are above pt2 and check if atleast one of them is above pt2
    condition1 = getattr(tree,branch)[index].Pt()>pt2
    condition1 &= getattr(tree,branch)[i].Pt()>pt2
    condition1 &= (getattr(tree,branch)[index].Pt()>pt1 or getattr(tree,branch)[i].Pt()>pt1)
    if i!=index and condition1:
        return True
    return False

def Online_mRNN_cond(tree,index,i, no_RNN = 440, m_RNN=280 ):
    '''This returns True or False for a particlular hltpt tau if it can be paired up 
    with the second tau and pass the medium RNN cut'''
    # For tau[index] RNN Medium(Loose) if pt < m_RNN(no_RNN) | no RNN ID if pt > no_RNN GeV
    RNN1_cond = (tree.TrigTRM_TauIDm[index]) and (tree.TrigTRM_Taus[index].Pt() < m_RNN)
    RNN1_cond |= (tree.TrigTRM_TauIDl[index]) and (tree.TrigTRM_Taus[index].Pt() > m_RNN) and (tree.TrigTRM_Taus[index].Pt() < no_RNN) 
    RNN1_cond |= tree.TrigTRM_Taus[index].Pt() > no_RNN
    # For tau[i] RNN Medium(Loose) if pt < m_RNN(no_RNN) | no RNN ID if pt > no_RNN GeV
    RNN2_cond = (tree.TrigTRM_TauIDm[i]) and (tree.TrigTRM_Taus[i].Pt() < m_RNN) 
    RNN2_cond |= (tree.TrigTRM_TauIDl[i]) and (tree.TrigTRM_Taus[i].Pt() > m_RNN) and (tree.TrigTRM_Taus[i].Pt() < no_RNN)
    RNN2_cond |= tree.TrigTRM_Taus[i].Pt() > no_RNN
    if RNN1_cond and RNN2_cond:
        return  True
    else:
        return False
    
def Online_DR_cond(tree,index,i,min_DR = 0.3, max_DR = 3):
    '''This returns True or False for a particlular tau if it can be paired up 
    with the second tau and pass the hltpt Delta R cut'''
    #Delta R should be greater than 0.3 and less than 3
    hltptDR_cond = tree.TrigTRM_Taus[index].DeltaR(tree.TrigTRM_Taus[i]) > min_DR 
    hltptDR_cond &= tree.TrigTRM_Taus[index].DeltaR(tree.TrigTRM_Taus[i]) < max_DR
    if hltptDR_cond:                                                            
        return True
    else:
        return False


def Online_hltpt_cond(tree,tau_i,pt1=35,pt2=25, no_RNN = 440, m_RNN =280, min_DR =0.3,max_DR =3):
    '''This returns True if the hltpt trigger condition is satisfied for tau_i'''
    if len(tree.TrigTRM_Taus)>=2:
        ptflag = 0
        RNN_flag = 0
        #Checking if tau_i has a pair that satidsfies the hltpt condition by looping over all taus 
        for i in range(len(tree.TrigTRM_Taus)):
            #Checking the pt condtion for tau_i and tau[i]  
            if pt_cond1(tree,"TrigTRM_Taus",tau_i,i,pt1,pt2):
                if ptflag ==0:
                    ptflag =1
                #Checking the medium RNN condition for tau_i and tau[i]
                if Online_mRNN_cond(tree,tau_i,i,no_RNN,m_RNN):
                    if RNN_flag == 0:
                        RNN_flag =1
                    #Checking the DeltaR condtion for tau_i and tau[i]
                    if Online_DR_cond(tree,tau_i,i,min_DR,max_DR):
                        return True
                        break
        return False


# Input files
histFileRoot = "user.32997101.ANALYSIS._000001.refined.root"
File = ROOT.TFile.Open(histFileRoot,"READ")
weightFileRoot = "user.32997101.ANALYSIS._000001.refined.EBweights.root"
WeightFile = ROOT.TFile.Open(weightFileRoot,"READ")

pt1 = np.linspace(20,45,6)     
pt2 = np.linspace(15,40,6)     
RNN = 440
m_RNN =280
DR = 0.3
max_DR = 3
t = 2860

with open('HLTpt_rates.csv', 'w', newline='') as csvfile:
    csvwriter = csv.writer(csvfile)
    csvwriter.writerow(['pt1','pt2','rate_HLTpt'])
    for pti in pt1:
        for ptj in pt2:
            hltpt_events = 0      
            if pti >= ptj:
                weight_tree = WeightFile.Get('trig')
                tree = File.Get("analysis")
                for event in range(tree.GetEntries()):
                    tree.GetEntry(event)
                    weight_tree.GetEntry(event)
                    if len(tree.TrigTRM_Taus)>=2:
                        hltptevent_flag = 0
                        for i in range(len(tree.TrigTRM_Taus)):
                            if hltptevent_flag == 0 and Online_hltpt_cond(tree,i,pti,ptj,RNN,m_RNN, DR,max_DR): 
                                hltpt_events += weight_tree.EBweight
                                hltptevent_flag = 1
                rate_hltpt = hltpt_events/t
                csvwriter.writerow([pti,ptj,rate_hltpt])

dastudillo · April 17, 2023, 9:00am

Some things to check:

These should probably be outside the nested for loops, and even outside the “with open…” section altogether:

                weight_tree = WeightFile.Get('trig')
                tree = File.Get("analysis")

Do you really need to loop over the whole tree every time (for event in range(tree.GetEntries())) inside those loops? (I did not check your code in detail, but there may be a better way than that).
Maybe you don’t need to pass the whole tree to the functions (pt…, online…), just the values that you actually use in those functions.

Athul_Dev · April 17, 2023, 9:15am

Hi dastudillo,

Thanks for the quick response!

I didnt initialize the trees inside the loop in my first try. But then I found the memory leakage and then tried to initialize it inside the loop and delete the tree after every loop. It hardly made any changes.
I also tried putting the for event in range(tree.GetEntries()) as my outermost loop and process all the events inside:

    with open('HLTpt_rates.csv', 'w', newline='') as csvfile:
        csvwriter = csv.writer(csvfile)
        csvwriter.writerow(['pt1','pt2','eff_HLTpt'])
        for event in range(tree.GetEntries()):
            tree.GetEntry(event)
            weight_tree.GetEntry(event)
            pass_events = np.array([])    # No of events that passed each trigger
            for pti in pt1:
                for ptj in pt2:
                    if pti >= ptj:
                        hltptevent_flag = 0
                        if len(tree.TrigTRM_Taus)>=2:
                            for i in range(len(tree.TrigTRM_Taus)):
                                if hltptevent_flag == 0 and Online_hltpt_cond(tree,i,pti,ptj,no_RNN = 440, m_RNN =280, min_DR =0.3,max_DR =3): 
                                    pass_events = np.append(pass_events,weight_tree.EBweight)
                                    hltptevent_flag = 1
                        if hltptevent_flag == 0:
                            pass_events = np.append(pass_events,0)    
            # print(pass_events.shape)        
            rate_column +=pass_events

        index = 0
        for pti in pt1:
            for ptj in pt2:
                if pti >= ptj:
                    csvwriter.writerow([pti,ptj,rate_column[index]/t])
                    index+=1

        csvfile.close()

But this made only a very small difference.

I have not tried that yet. But should that make a difference? Im just usingt the tree which is already defined in the beginning right? Also if you check the memory profile, the memory increases with the number of events i try to process. I dont understand why there is a stark difference. The only variable i need to keep in the memory in every loop is the events that pass my trigger. Rest all need not be in the memory for every loop.

bellenot · April 17, 2023, 9:57am

Welcome to the ROOT Forum!
Try with:

def Online_mRNN_cond(tree,index,i, no_RNN = 440, m_RNN=280 ):
    '''This returns True or False for a particlular hltpt tau if it can be paired up
    with the second tau and pass the medium RNN cut'''
    return True
    # For tau[index] RNN Medium(Loose) if pt < m_RNN(no_RNN) | no RNN ID if pt > no_RNN GeV
    taus_index_pt = tree.TrigTRM_Taus[index].Pt()
    taus_i_pt = tree.TrigTRM_Taus[i].Pt()
    RNN1_cond = (tree.TrigTRM_TauIDm[index]) and (taus_pt < m_RNN)
    RNN1_cond |= (tree.TrigTRM_TauIDl[index]) and (taus_pt > m_RNN) and (taus_pt < no_RNN)
    RNN1_cond |= taus_pt > no_RNN
    # For tau[i] RNN Medium(Loose) if pt < m_RNN(no_RNN) | no RNN ID if pt > no_RNN GeV
    RNN2_cond = (tree.TrigTRM_TauIDm[i]) and (taus_i_pt < m_RNN)
    RNN2_cond |= (tree.TrigTRM_TauIDl[i]) and (taus_i_pt > m_RNN) and (taus_i_pt < no_RNN)
    RNN2_cond |= taus_i_pt > no_RNN
    if RNN1_cond and RNN2_cond:
        return  True
    else:
        return False

Athul_Dev · April 17, 2023, 12:27pm

Hi bellenot,

Thanks for the response. I tried your suggestion, but it didnt make much difference to the memory usage.
PFA the memory profile logs
memory_usage.txt (22.5 KB)
before and after disabling Online_mRNN_cond() function.

bellenot · April 17, 2023, 12:35pm

Right, sorry, my bad…

Athul_Dev · April 18, 2023, 7:37am

Is it a problem with my root file itself?

bellenot · April 18, 2023, 9:52am

I don’t think so. Maybe @pcanal can take a look

pcanal · April 19, 2023, 7:11pm

The problem is more in the python code. The issue is link to the way PyRoot checks for enums and is likely the same as [cling] Memory hogging when checking if type is an enum · Issue #10454 · root-project/root · GitHub. You may be able to work around the issue by applying the (no yet completed) PR: [PyROOT][10454] Prevent memory hogging issue when checking for enums by etejedor · Pull Request #11412 · root-project/root · GitHub.

system · May 3, 2023, 7:12pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.