Hello experts,
I am trying to make use of some of the new distributed RDataFrame features, specifically the dask backend. I am using the nightly build from /cvmfs/sft.cern.ch/lcg/views/dev3/latest/x86_64-el9-gcc11-opt
.
When using the non-distributed RDataFrame, I can do things like
df_new = df_old.Define("xsquared", "x*x")
cols = df_new.GetColumnNames()
However, when constructing an RDataFrame using RDF.Experimental.Distributed.Dask.RDataFrame
, and trying to run something like the above, I find that the call to GetColumnNames()
throws an error like below:
Traceback (most recent call last):
File "/cvmfs/sft-nightlies.cern.ch/lcg/views/dev3/Mon/x86_64-el9-gcc11-opt/lib/DistRDF/Proxy.py", line 237, in __getattr__
return getattr(self.proxied_node, attr)
AttributeError: 'Node' object has no attribute 'GetColumnNames'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/cvmfs/sft-nightlies.cern.ch/lcg/views/dev3/Mon/x86_64-el9-gcc11-opt/lib/DistRDF/Proxy.py", line 248, in __getattr__
raise AttributeError(msg)
AttributeError: 'Define' object has no attribute 'GetColumnNames'
Perhaps I’m missing something obvious, but is there an example of how to access the attributes of an instance of the distributed RDataFrame after defining/filtering/etc columns?
ROOT Version: 6.33/01 From heads/master@v6-31-01-1852-gdb8f2ef07c
Platform: x86_64-el9-gcc11-opt
Compiler: g++ (GCC) 11.3.0