"Unknown column" when attempting to Snapshot an RDataFrame with a new, Defined column

I’ve written the following code, which takes in two files (test1.root and test2.root) in an RDataFrame. Each contains a TNtuple with the columns/branches x and y.

I then define a new column, z, produced by taking the element-wise product of the two columns.

Finally, I attempt to store the RDataFrame as a TTree in a .root file.

Here’s my code:

vector<string> file_list = {"test1.root", "test2.root"};
ROOT::RDataFrame df("N", file_list);

df.Define("z", [](int a, int b){ return a * b; },{"x","y"});
df.Snapshot("N", "test4.root", {"x","y","z"});

Unfortunately, this gives the error: Unknown column: z". Where have I erred?

When I try storing only the x and y columns, all goes well.

Try like this:

auto df_z = df.Define("z", [](int a, int b){ return a * b; },{"x","y"});
df_z.Snapshot("N", "test4.root", {"x","y","z"});

Cheers,
Jakob

2 Likes

You can find useful as well to use RNode

ROOT::RDF::RNode myNode(df) ;

myNode =myNode.Define(…
myNode.Snapshot(

The issue is that the df.Define you call create a processing graph node which is not used / known in the node used to Snapshot.

1 Like

Thanks! This works great.