Add to design a way to include non-column arguments when defining slots using C++ functions?


When defining RDataFrames with for example DefineSlot, one can either pass a lambda function or a C++ function. When passing a C++ function, there is no way (as far as I am aware) in RDataFrames to pass arguments to those functions that are not columns of the dataframe. This is a point of frustration to me and many people around me, where we have to resort to using lambdas with variable capture which make the code more cumbersome.

I wonder what is the argument behind not implementing the ability to pass additional variables to c++ functions when defining slots? I can imagine a design where DefineSlot takes an extra argument (e.g. ExtraArgs) and user is required to define their C++ functions such that any of these non-column arguments are expected at the end of the argument list. Then the ExtraArgs array can be “appended” to the end of the column arguments list when executing the function.

Of course this is a very simplistic design, and I am sure it would need a lot of safety mechanisms to ensure correct usage, but it’s the first thing that comes to mind.

If I have missed a way to do this with the current setup, please do let me know :slight_smile: If not, I would also like to hear what the ROOT team thinks of possible design solutions to resolve this.

Hi @MoAly98,

AFAIK, there’s currently no way of passing additional parameters to the specified callable object. Therefore, I assume that you are doing something of the form:

df.DefineSlot("x", [&extra] (unsigned int i, double c1, double c2) {
      return other_func(i, c1, c2, extra);
   }, {"c1", "c2"});

Probably due to not being a requested feature by users. Could you elaborate more on why your use case needs to pass extra arguments? I’m also inviting @vpadulan and @eguiraud to the topic, as they will benefit from the discussion here.



Hi @jalopezg !

Thanks alot for the quick reply :slight_smile:

Yes I currently implement the functions the way you showed. I personally think even just the simple example you provide demonstrates the point. You had to define the arguments to the function twice (once in defining other_func and another in defining the lambda) which is not optimal. A couple of other situations I ran into were

  • I want to have some dummy value to be used in some conditons inside the function. At the moment I define this help variable as a column and pass it.
  • A funciton that needs a file path passed to it
  • I have some function which looks through a vector of objects and finds an object with some name that a user has given to the function. This name is often a const string that needs to be passed when defining the slot that will stroee the object. In this case i don’t even want to capture something from global scope , i just want to pass a string to the method.

As I said, I do use lambda to get around these situations, but that’s not ideal because of the fact that it requires repeating the method’s arguments and their types and it occupies a lot more space. So it is not critical, but if it’s not hard to implement I think it would be useful for other users too who want to use this for example in evaluating machine learning algorithms.

FWIW, a generic lambda should mitigate that, but I agree it’s not ideal.

              [&extra] (auto... args) {return other_func(args..., extra); },
              {"c1", "c2"});

For the rest, I’ll let @vpadulan reply. :slight_smile:


Hi @MoAly98 ,
Thanks for reaching out with your comments!
In fact, this idea is already circulating within the team, but in the context of the Python API of RDataFrame. In future versions, we would like users to be able to do:

def myfun():
    return 42
df.Define("mycol", myfun)

Instead of having to resort to writing C++ in strings as it is done currently. In that context, the possibility of adding extra args for the Python function becomes quite important to ensure transparency and make the API feel 100% Pythonic (**kwargs is a basic language feature after all).

We have so far never discussed about the possibility of adding this functionality to the C++ interface too. It would definitely be more tricky and I personally don’t see the practical need for it (yet), capturing objects in the lambda is an idiomatic and easy way to solve this, although of course it involves a few extra characters. I take note of your comments and I will think about how much work this would involve.