RDataFrame Define does not create a new column

I do not seem to be able to define a new column with RDataFrame:

In [1]: import ROOT

In [2]: f = ROOT.TFile.Open('root://path/to/file.root')

In [3]: t = f.Get('tree name')

In [4]: df = ROOT.RDataFrame(t)

In [5]: df.Define('pt_test', 'sqrt(X_PX*X_PX + X_PY*X_PY)')
Out[5]: <ROOT.ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> object at 0x7fb4004ab840>

In [6]: 'pt_test' in df.GetColumnNames()
Out[6]: False

In [7]: h1 = df.Histo1D('X_PX')

In [8]: h2 = df.Histo1D('pt_test')
TypeError                                 Traceback (most recent call last)
<ipython-input-8-0985558c03c3> in <module>()
----> 1 h2 = df.Histo1D('pt_test')

TypeError: can not resolve method template call for 'Histo1D'

What am I doing wrong?

ROOT Version: 6.15/01
Platform: MacOS
Compiler: Not Provided

Defined columns are only visible from the point of definition onwards:

df_with_define = df.Define(...)

will show the Defined column.

This way RDF lets you define a computation graph with complex dependencies, and you have fine-grained control on the visibility of columns.
If needed you could even define the same column for different branches of the computation graph.

See the user guide for a more detailed explanation, or ask here if I was not clear enough!


Oh I see. Thank you. This usage is clear from the Define documentation, but it is sort of buried in the Crash Course, where the custom columns explanation includes a code snippet that led me to believe I was using Define correctly. Your reply makes clear what is intended by “from the point of definition onwards”.

Thanks again for the illuminating answer.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.