Hello,
Happy New Year!
I am working on implementing group-by-like counting for my analyses using ROOT’s RDataFrame. However, I have encountered significant performance challenges with my current approach. I wanted to check if the method I am using is the canonical way to achieve this or if there are optimizations, I might be missing that could improve its efficiency.
I have included a minimal reproducible example of my current implementation:
# %%
import ROOT
import numpy as np
# %%
df = ROOT.RDataFrame(100)
df = df.Define("angle", "gRandom->Uniform(0, 3.14)")
df = df.Define("nTracks", "gRandom->Integer(3)")
# %%
# The idea is to bin the angles and find the number of events with 0, 1, 2, 3 tracks in each bin
bins = np.linspace(0, 3.14, 5, dtype=np.double)
model = ROOT.RDF.TH1DModel("angle", "angle", bins.size - 1, bins)
angle_hist = df.Histo1D(model, "angle")
# %%
count_two_tracks = np.zeros(bins.size - 1, dtype=np.double)
count_one_track = np.zeros(bins.size - 1, dtype=np.double)
count_zero_track = np.zeros(bins.size - 1, dtype=np.double)
for bin_idx in range(bins.size - 1):
bin_low = bins[bin_idx]
bin_high = bins[bin_idx+1]
print(f"Processing bin ({bin_idx}):\t{bin_low:.3f} - {bin_high:.3f}")
rdf_bin = df.Filter(f"angle > {bin_low} && angle < {bin_high}", f"{bin_low} < angle < {bin_high}")
count_two_tracks[bin_idx] = rdf_bin.Filter("nTracks == 2").Count().GetValue()
count_one_track[bin_idx] = rdf_bin.Filter("nTracks == 1").Count().GetValue()
count_zero_track[bin_idx] = rdf_bin.Filter("nTracks == 0").Count().GetValue()
# %%
# Make histograms of counts
two_track_hist = ROOT.TH1D("two_track_hist", "two_track_hist", bins.size - 1, bins)
for idx, bin_count in enumerate(count_two_tracks):
two_track_hist.SetBinContent(idx+1, bin_count)
The goal is to bin angles and count events with 0, 1, 2, and 3 tracks in each bin. While the logic works as expected, the performance is suboptimal, particularly when scaling to larger datasets.
Are there any suggestions for improving the implementation?
Thanks.
Please read tips for efficient and successful posting and posting code
Please fill also the fields below. Note that root -b -q
will tell you this info, and starting from 6.28/06 upwards, you can call .forum bug
from the ROOT prompt to pre-populate a topic.
ROOT Version: 6.35.01
Platform: linuxx8664gcc
Compiler: Not Provided