I do not seem to be able to define a new column with RDataFrame:

In [1]: import ROOT

In [2]: f = ROOT.TFile.Open('root://path/to/file.root')

In [3]: t = f.Get('tree name')

In [4]: df = ROOT.RDataFrame(t)

In [5]: df.Define('pt_test', 'sqrt(X_PX*X_PX + X_PY*X_PY)')
Out[5]: <ROOT.ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> object at 0x7fb4004ab840>

In [6]: 'pt_test' in df.GetColumnNames()
Out[6]: False

In [7]: h1 = df.Histo1D('X_PX')

In [8]: h2 = df.Histo1D('pt_test')
TypeError                                 Traceback (most recent call last)
<ipython-input-8-0985558c03c3> in <module>()
----> 1 h2 = df.Histo1D('pt_test')

TypeError: can not resolve method template call for 'Histo1D'

What am I doing wrong?

ROOT Version: 6.15/01
Platform: MacOS
Defined columns are only visible from the point of definition onwards:

df_with_define = df.Define(...)

will show the Defined column.

This way RDF lets you define a computation graph with complex dependencies, and you have fine-grained control on the visibility of columns.
If needed you could even define the same column for different branches of the computation graph.

See the user guide for a more detailed explanation, or ask here if I was not clear enough!


Oh I see. Thank you. This usage is clear from the Define documentation, but it is sort of buried in the Crash Course, where the custom columns explanation includes a code snippet that led me to believe I was using Define correctly. Your reply makes clear what is intended by “from the point of definition onwards”.

Thanks again for the illuminating answer.

