Removing entire columns from RDataFrame object

Hello,

I have a RDataFrame object df of which I am trying to obtain more information by calling

df.Describe().Print()

However, this yields the following error:

File "/Users/yannickburkard/Documents/framework/PyRoot_analysis/root_to_histogram_RDataFrame.py", line 229, in <module>
    df.Describe().Print()
cppyy.gbl.std.runtime_error: ROOT::RDF::RDFDescription ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Describe() =>
    runtime_error: TTree leaf EFlowNeutralHadron.Edges[4] has both a leaf count and a static length. This is not supported.

I understand that the branch EFlowNeutralHadron.Edges is problematic, so I wanted to remove it from the RDataFrame, but I am having trouble identifying a function that removes a column from such an object. Does anyone know how to do this?

Hi @yburkard ,

sorry for the high latency. Ah, this is annoying indeed.

There is no way to β€œhide” a column from RDataFrame but for Display and Snapshot, which by default act on all available columns, you can pass a list of columns that you do care about.

The only workaround here is probably to Snapshot all columns to a new file except the problematic ones.

cc: @Axel – this is another instance of [ROOT-9509] [DF] Add proper support for multidimensional arrays - SFTJIRA .

Cheers,
Enrico

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.