Access data across event/row in RDataFrame


I am trying to use RDataFrame to calculate the distance between every two events in a tree. For example, the distance is defined as the absolute value of the difference between two momentum p1 and p2 from two different events.

Assume the tree is read into an RDataFrame. I want to define a new column distance, for each row it will be an array of the distances between this row and other rows.

However, from what I know the function Define can only access the columns inside each event/row, and do one event loop in action. But calculating the distances requires two event loops, accessing data across events.

I tried to search for possible strategies on the RDataFrame documentation page but found nothing useful so far. Is this possible in RDataFrame or am I missing some method already exists?

Thank you for your help,

1 Like

Dear @zhangdanyi ,

Thank you for asking your question on the forum. You are totally right in that the native way of execution of RDataFrame is per-event (soon it will be per-group-of-events), but in general there is no direct way in the API to establish relations between the current event and others.

There have been similar discussions on the forum, one way to achieve what you want is implementing an helper to act as a “sliding window” on the values of your dataset. See a concrete example in this forum post. One potential issue I see is regarding this specific statement

for each row it will be an array of the distances between this row and other rows.

This means that somehow you need to store all the values of the column in memory, and for each event compare the current value against all the others. It can be done, but it surely has a cost.


1 Like

Thank you! It might be tricky/unsafe to do that with multi-threading. We’ll go back to the normal event loop with ROOT.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.