Event mixing with RDF

Dear all,

I would like to perform event mixing using RDataFrame.

What I have in mind is to take some quantities from the i-th row and some from the (i+1)-th row to create a fictitious “mixed” event.

In the example below, I want x from the i-th row and y from the (i+1)-th row

# example from 
# https://root-forum.cern.ch/t/saving-pandas-dataframe-as-ttree-with-rdataframe/42720/2

# Create a pandas dataframe
import pandas as pd
import numpy as np
df = pd.DataFrame()
df['x'] = np.array([1, 2, 3])
df['y'] = np.array([4, 5, 6])

# Convert data to a dictionary with numpy arrays
data = {key: df[key].values for key in df.columns}

# Write the dictionary with numpy arrays to a ROOT file
import ROOT
rdf = ROOT.RDF.MakeNumpyDataFrame(data)

# Again, have a look!
rdf.Display().Print()

this returns

+-----+---+---+
| Row | x | y |
+-----+---+---+
| 0   | 1 | 4 |
+-----+---+---+
| 1   | 2 | 5 |
+-----+---+---+
| 2   | 3 | 6 |
+-----+---+---+

my goal is to obtain

+-----+---+---+
| Row | x | y |
+-----+---+---+
| 0   | 1 | 5 |
+-----+---+---+
| 1   | 2 | 6 |
+-----+---+---+
| 2   | 3 | 4 |
+-----+---+---+

(in case, I wouldn’t care if the first/last rows are clipped because they are at the boundaries of the row range)

Do you have any suggestions how to achieve this in a smart way?

Thanks!
Riccardo

Dear @riccardomanzoni ,

Let me make an extreme sempliifcation here. The execution of computations in RDataFrame could be boiled down to

for (auto i = 0; i < tree.GetEntries(); i++){
    tree.GetEntry(i)
    run_computations(tree)
}

Of course there are many clever things around it, but this is just to give the idea that it traverses the input dataset, one entry at a time, for the columns that you need in your application. All the columns will be queried with the same entry number. So I don’t see a clear way to implement your use case directly within the existing API and machinery. In principle we could add an action that lags the values of a certain column and then following calls to the API in the same computation graph branch would see the lagged values. I don’t think that will be high priority for the time being.

As a not-so-efficient workaround, you could think about preparing your input dataset before creating the RDataFrame, i.e. having the correctly shifted arrays as input dataset to RDF.

Cheers,
Vincenzo

Hi,

see Event mixing with RDataFrame for an example of a simple sliding window implementation.

Cheers,
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.