Writing TTrees with pyroot efficiently

Tim_Buktu · April 11, 2024, 5:01pm

I am looking for a more efficient to fill and write my tree than the code snippet below shows.
Basically I have around 30 channels and each event for each channel is a list of 1000 samples. For a large number of events this gets really slow. The bottleneck is clearly the loop I use to fill the branches and I wonder if there is a better way to do this.

global array

# Create branch array for each channel (number of channels 30).
# Each event is a list of 1000 samples for each channel
# A data dictionary contains the channel names as keys and its value is a 
# 2-D numpy array of shape (number_of events, 1000)
for channel in data_dictionary.keys():
      branches[channel] = array.array('i', [0] * 1000)

for channel, array in branches.items():
    tree.Branch(channel, array, f'{channel}[1000]/I')

# Loop through the events and fill the branch arrays. Obviously the bottleneck.
num_events = 1000
for event_index in range(num_events):
        for channel in data_dictionary.keys():
                for j, sample in enumerate(data_dictionary[channel][event_index]):
                    branches[channel][j] = int(sample)
        tree.Fill()

couet · April 12, 2024, 7:31am

I guess @pcanal can help.

mczurylo · April 12, 2024, 7:55am

Hi @Tim_Buktu,

have you considered using RDataFrame for your analysis: ROOT: ROOT::RDataFrame Class Reference? This could simplify and improve the performance greatly (with an easy addition of multithreading as well). Since you have multiple channels/samples you could also consider using the DefinePerSample via the specification file functionality.

Cheers,
Marta

Tim_Buktu · April 12, 2024, 8:09am

Hi @mczurylo, thanks for the reply. I have looked into RDataFrame but came to the conclusion that I cannot use it since it does not support multidimensional arrays, which is my case since each event is an array of samples (ie. adc samples of a waveform).

See also the comment here:

vpadulan · April 12, 2024, 12:28pm

Dear @Tim_Buktu ,

There is a very neat way to do this nowadays, thanks to How to convert to/from ROOT RDataFrame — Awkward Array 2.6.3 documentation

I am giving you a simplified example that should represent your situation

import awkward
import numpy

n_events = 10
values_per_event = 1000
data = {
    key: numpy.random.rand(n_events, values_per_event)
    for key in ["chan_{i}" for i in range(30)]
}
ak_arrays = {
   key: awkward.from_numpy(arr)
   for key, arr in data.items()
}
df = awkward.to_rdataframe(ak_arrays)
df.Snapshot("mytree", "myfile.root")

Cheers,
Vincenzo

Tim_Buktu · April 12, 2024, 12:56pm

Great answer, thank you very much!

system · April 26, 2024, 12:57pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.