I am looking for a more efficient to fill and write my tree than the code snippet below shows.
Basically I have around 30 channels and each event for each channel is a list of 1000 samples. For a large number of events this gets really slow. The bottleneck is clearly the loop I use to fill the branches and I wonder if there is a better way to do this.
global array
# Create branch array for each channel (number of channels 30).
# Each event is a list of 1000 samples for each channel
# A data dictionary contains the channel names as keys and its value is a
# 2-D numpy array of shape (number_of events, 1000)
for channel in data_dictionary.keys():
branches[channel] = array.array('i', [0] * 1000)
for channel, array in branches.items():
tree.Branch(channel, array, f'{channel}[1000]/I')
# Loop through the events and fill the branch arrays. Obviously the bottleneck.
num_events = 1000
for event_index in range(num_events):
for channel in data_dictionary.keys():
for j, sample in enumerate(data_dictionary[channel][event_index]):
branches[channel][j] = int(sample)
have you considered using RDataFrame for your analysis: ROOT: ROOT::RDataFrame Class Reference? This could simplify and improve the performance greatly (with an easy addition of multithreading as well). Since you have multiple channels/samples you could also consider using the DefinePerSample via the specification file functionality.
Hi @mczurylo, thanks for the reply. I have looked into RDataFrame but came to the conclusion that I cannot use it since it does not support multidimensional arrays, which is my case since each event is an array of samples (ie. adc samples of a waveform).