How do you pass a templated/overloaded function to RDataFrame define/filter?
The alternative would be to use strings or lamdas after depending on the column type, neither of which is convenient.
template <typename T>
RVec<T> square(RVec<T> vec)
{
RVecD squared_vec;
for (auto v : vec)
{
squared_vec.push_back(v * v);
}
return squared_vec;
}
int main()
{
ROOT::RDataFrame df(10);
auto df_defines = df.Define("x", get_random_vector)
.Define("y", get_random_int_vector)
.Define("x_squared", square, {"x"})
.Define("y_squared", square, {"y"});
df_defines.Display({"x", "y"})->Print();
return 0;
}
RVecD get_random_vector()
{
RVecD vec;
for (int i = 0; i < 10; i++)
{
vec.push_back(gRandom->Gaus());
}
return vec;
}
RVecI get_random_int_vector()
{
RVecI vec;
for (int i = 0; i < 10; i++)
{
vec.push_back(gRandom->Integer(10));
}
return vec;
}
Please fill also the fields below. Note that root -b -q will tell you this info, and starting from 6.28/06 upwards, you can call .forum bug from the ROOT prompt to pre-populate a topic.
ROOT Version: Not Provided Platform: Not Provided Compiler: Not Provided
Pass by const reference collections such as RVec ( e.g. myfunc (const RVec<T>& v))
The VecOps can greately facilitate operations on RVecs, e.g. sqrt(myRvec) returns an RVec of the square roots, or myRvec*myRvec returns the vec of squares
I have a non trivial modification that has to go through each column. The above was just a reproducer.
Passing the type explicity is not ideal for the situation but I suppose it has work.
Just checking if you have a better recommendation.
for (auto col : column_list)
{
auto type = rdf.GetColumnType(col);
if (type.find("double") != std::string::npos)
rdf = rdf.Define(col + "_mod", square<double>, {col})
else if (type.find("int") != std::string::npos)
rdf = rdf.Define(col + "_mod", square<int>, {col})
...
}
This has to be done for each unique column type which itself is a long list containing unsigned int and various other type modifiers. Do let me know if there is anything better to do here.
The column type is something that can be known only at run time, while the template type is something that has to be known at compile time: reconciling these two is always going to require some work.
One option could be to create all of your functions as overloads, not templates, jit them with cling, and then invoke them in the define. Depending on your functions, the performance penalty may be small to negligible:
...
gInterpreter->Declare("#include \"myFunctions.h\"")
for (const auto& col : column_list) {
rdf = rdf.Define(col + "_mod", "foo("+col+")"); // <-- this invokes the right overload
}
...