Dear ROOT experts,
I am trying to sort the events in TTree containing hundreds of branches and ~million events using RDataFrame.
The TTree has branches named GenModel_TChiWH_950_400, GenModel_TChiWH_900_400, .....
etc. For any given event only one of these is true and all other similar branches are false. So I want to create multiple root files by sorting the events according to GenModel_TChiWH_*. For example, I need to create SortedFile_GenModel_TChiWH_950_50.root
in which all events belong to GenModel_TChiWH_950_50 == true
.
Here is the script [1] I have. The issue I am facing is that the memory usage keeps growing as the files are produced. At the end, I think it consumes about 2.5 GB for an input file of 221 MB and ~130k events.
Is there a way to “clear some memory” inside the for loop? I am fine with increased computing time by some amount. I have turned off multi-threading since it reduces the memory usage to some extent.
Thanks,
Vinay
_ROOT Version: 6.26/07
_Platform: AlmaLinux release 9.4 (Seafoam Ocelot)
Compiler: Not Provided
[1]
import ROOT as rt
import sys
# rt.EnableImplicitMT() # Enable multi-threading. Not that helpful for this code. Can save some memory with single thread.
if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: python3 makeTTreeForEachMass_v2.py <In_root_file> <OutFile string>")
sys.exit(1)
file_path = sys.argv[1]
outFnameSt = sys.argv[2]
print("Starting analysis")
tree_name = "Events"
df = rt.RDataFrame(tree_name, file_path)
Nentries = df.Count().GetValue()
print("Number of events:", Nentries)
# Get all branch names
branch_names = df.GetColumnNames()
# print(branch_names)
# Extract mass pairs from branch names
branchNamePatr = "GenModel_TChiWH_" # There are branches named GenModel_TChiWH_950_50, GenModel_TChiWH_950_400, etc. For a given event only one of these GenModel_TChiWH_* are 1 (true). Other GenModel_TChiWH_* are set to 0 (false).
mass_pairs = []
outFileNames = []
totalEvents = 0
for branch in branch_names:
branch_str = str(branch)
if str(branchNamePatr) in branch_str:
mXY = branch_str.split('_')[-2::]
mX = float(mXY[0])
mY = float(mXY[1])
mass_pairs.append((mX, mY, branch_str))
###########
df_temp = df.Filter(f"({branch} == 1)")
outName = outFnameSt+"_"+str(int(mX))+"_"+str(int(mY))+".root"
print("Creating file",outName)
df_temp.Snapshot("Events",outName) # write a TTree that contains events in which only GenModel_TChiWH_950_400 is true, for example.