Writing TTrees with pyroot efficiently

I am looking for a more efficient to fill and write my tree than the code snippet below shows.
Basically I have around 30 channels and each event for each channel is a list of 1000 samples. For a large number of events this gets really slow. The bottleneck is clearly the loop I use to fill the branches and I wonder if there is a better way to do this.

global array

# Create branch array for each channel (number of channels 30).
# Each event is a list of 1000 samples for each channel
# A data dictionary contains the channel names as keys and its value is a 
# 2-D numpy array of shape (number_of events, 1000)
for channel in data_dictionary.keys():
      branches[channel] = array.array('i', [0] * 1000)

for channel, array in branches.items():
    tree.Branch(channel, array, f'{channel}[1000]/I')

# Loop through the events and fill the branch arrays. Obviously the bottleneck.
num_events = 1000
for event_index in range(num_events):
        for channel in data_dictionary.keys():
                for j, sample in enumerate(data_dictionary[channel][event_index]):
                    branches[channel][j] = int(sample)
        tree.Fill()

I guess @pcanal can help.

Hi @Tim_Buktu,

have you considered using RDataFrame for your analysis: ROOT: ROOT::RDataFrame Class Reference? This could simplify and improve the performance greatly (with an easy addition of multithreading as well). Since you have multiple channels/samples you could also consider using the DefinePerSample via the specification file functionality.

Cheers,
Marta

Hi @mczurylo, thanks for the reply. I have looked into RDataFrame but came to the conclusion that I cannot use it since it does not support multidimensional arrays, which is my case since each event is an array of samples (ie. adc samples of a waveform).

See also the comment here:

Dear @Tim_Buktu ,

There is a very neat way to do this nowadays, thanks to How to convert to/from ROOT RDataFrame ā€” Awkward Array 2.6.3 documentation

I am giving you a simplified example that should represent your situation

import awkward
import numpy

n_events = 10
values_per_event = 1000
data = {
    key: numpy.random.rand(n_events, values_per_event)
    for key in ["chan_{i}" for i in range(30)]
}
ak_arrays = {
   key: awkward.from_numpy(arr)
   for key, arr in data.items()
}
df = awkward.to_rdataframe(ak_arrays)
df.Snapshot("mytree", "myfile.root")

Cheers,
Vincenzo

Great answer, thank you very much!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.