RDataFrame Define columns with same name

RENATO_QUAGLIANI · December 17, 2020, 11:18am

Dear expert,
I would like to understand an aspect of RDataFrame::Define.
My use case is the following :
I have already processed an ntuple with some Define( X , a functor, {"inputneeded"}.Snapshot() .
Now i load the snapshotted ntuple but my functor is updated.

What i want to achieve is to “re-define” X with the same exact name in a dataframe and use the updated “X” branch.
I have not seen issues or warnings from RDataFrame when this is done, however i am not sure if

Tuple *tuple = GetTupleFromSomewhere(); 
// this tuple has branch "X" in 
ROOT::RDataFrame df( *tuple); 

df.Define("X", myFunctor, {"inputs"})
// is now "X" overwriting the existing "X" ?

Also , would ```tuple->SetBranchAddress(“X”, 0) avoid to ship to to the dataframe the branch “X” ?

Thanks in advance
Renato

Please read tips for efficient and successful posting and posting code

ROOT Version: Not Provided
Platform: Not Provided
Compiler: Not Provided

RENATO_QUAGLIANI · December 17, 2020, 1:33pm

The dirty solution i made is that when doing df.Define(“X” )i call it X_v1 for example, and then the code works, however I wonder if it can exist a mechanism “blacklisting” columns in the TTree->DataFrame creation.

etejedor · December 17, 2020, 2:39pm

Hello,

Redefinition of columns is not supported yet, but it was mentioned as one of the possibilities for the PoW in 2021 (although it’s not top-priority). For now I guess the definition with another name is the way to go.

@eguiraud ?

eguiraud · December 17, 2020, 3:00pm

Ugly hack until df.Redefine is implemented: df.Filter([](double &x) { x = ...; return true; }) should work (note that the input is passed by non-const reference).

(not recommended, renaming is better)

system · December 31, 2020, 3:00pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.