Thanks for the responses!
Regarding the structure of the data. Previously for some other data analysis we have read .root files and created a TTree and then converted it to a RDataFrame. However, I thought that when sorting data it is probably more efficient to use the Tree class and thus I wanted to stay there. Anyhow to present more insight into the datastructure I present the following code snippet and the output.
tree = create_Tree() #Self definied function that returns a tree created from the data
print(f'type of tree: {type(tree)}')
print(f'tree: {tree}')
df = ROOT.RDataFrame(tree)
print(f'type of dataframe: {type(df)}')
print(f' df: {df}')
print(df.GetColumnNames())
print(df.GetColumnTypeNamesList())
generates the following output
tree: Name: Data_R Title:
type of dataframe: <class cppyy.gbl.ROOT.RDataFrame at 0x82ed998>
df: A data frame built on top of the Data_R dataset.
{ "Channel", "Timestamp", "Board", "Energy", "Flags" }
Traceback (most recent call last):
File "3läsadata.py", line 67, in <module>
print(df.GetColumnTypeNamesList())
AttributeError: 'RDataFrame' object has no attribute 'GetColumnTypeNamesList'
The “Board” and “Flags” data are not of any importance at this stage.
To clarify what each column contains here is an example of how the data would have been presented in a spreadsheet, without loss of information:
event timestamp energy channel
1 23453 150 3
2 32423 3400 4
.
.
.
Data can be obtained using the following snippet of code:
for event in myTree:
ch = event.Channel
time = event.Timestamp
energy= event.Energy
and ch, time and energy are all floats in this case (or ints, but you get the point, numbers not lists)
If I would have had acces to unlimited computing power I would have written the following code:
list_of_coincident_events=[]
for event1 in tree:
ch1 = event1.Channel
time1 = event1.Timestamp
energy1 = event1.Energy
for event 2 in myTree:
time2 = event2.Timestamp
if abs(time1 - time2) <=400 :
ch2 = event2.Channel
time2 = event2.Timestamp
energy2 = event2.Energy
list_of_coincident_events.append([ch1, ch2, time1, time2, energy1,
energy2])
#Something that saves list_of_coincident_events in a reasonable format
I currently think that a reasonable way of performing the task is to sort the tree based on timestamp and then loop through the data like above, but the second for loop wouldn’t be necessary (since the data is sorted and only the next event would have to be considered), wich would reduce the computation time drastically.
edit: added code tags