Define column for filtered data frame

Suppose I have a data frame that I want to refer to with and without a filter. Further suppose that, after defining the filter, I want to define a new column. How should I allow both the filtered and unfiltered data frames to refer to the new column? The following works, but prints an error:

In [1]: import ROOT

In [2]: f = ROOT.TFile.Open('root://path/to/file.root')

In [3]: t = f.Get('some tree')

In [4]: df = ROOT.RDataFrame(t)

In [5]: df_filt = df.Filter('some selection')

In [6]: df_defi = df.Define('pt_test', 'sqrt(X_PX*X_PX + X_PY*X_PY)')

In [7]: df_filt_defi = df_filt.Define('pt_test', 'sqrt(X_PX*X_PX + X_PY*X_PY)')
input_line_84:1:25: error: redefinition of 'eval_pt_test'
namespace __tdf1 { auto eval_pt_test = [](Double_t& X_PX, Double_t& X_PY){return sqrt(X_PX*X_PX + X_PY*X_PY)
                        ^
input_line_81:1:25: note: previous definition is here
namespace __tdf1 { auto eval_pt_test = [](Double_t& X_PX, Double_t& X_PY){return sqrt(X_PX*X_PX + X_PY*X_PY)
                        ^

In [8]: 'pt_test' in df_filt.GetColumnNames()
Out[8]: False

In [9]: 'pt_test' in df_defi.GetColumnNames()
Out[9]: True

In [10]: 'pt_test' in df_filt_defi.GetColumnNames()
Out[10]: True

ROOT Version: 6.15/01
Platform: macOS
Compiler: Not Provided


Hi,
it used to be the case that you could not Define two columns with the same name in the same RDF computation graph.

The feature you need was added recently in the master branch and will be available in the next release (coming soon) v6.16.

Your example code should work just fine with the latest master branch.

Cheers,
Enrico

You can also consider using:

In [4]: df = ROOT.RDataFrame(t)

In [5]: df_defi = df.Define('pt_test', 'sqrt(X_PX*X_PX + X_PY*X_PY)')

In [5]: df_filt = df_defi.Filter('some selection')
1 Like

Thank you. I updated to the latest version and it works just fine.

Good!
Note that as @pcanal suggests, if you are defining the same column with the same expression both times, you might just pull up the definition and then do the rest of the analysis in two branches of the graph, one with and the other without the filter. The Defined quantity will anyway be computed at most once per event, as needed.

Cheers,
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.