It would require a full parser, but my impression was that with libclang that’s not really a barrier.
cling makes something like this possible, but not necessarily straightforward (for the reasons you list and more).
Also the double compilation pass is a bit clunky imo.
If you misspell a branch name in the branch list – runtime error. Get the order wrong (very easy if you define a function instead of using an inline lambda) – runtime error. Use the wrong type – runtime error.
I hear you. I don’t have a non-jitted solution (we need the compiler to see the branch types in the signatures and RDataFrame to have the names of the branches), but I am aware that’s a pain point.
My current solution is to define one column at the beginning containing a struct and put all the information I need in that struct. This doesn’t seem very efficient though.
If you know that you will always read all the branches in the struct for each event, this costs one extra copy of those values, which is probably not a performance bottleneck. If it does slow you down, refactoring later is fairly straightforward. If you don’t want to read all the branches in the struct for each event, this method gets you wasteful reading (which might have a sensible runtime cost).
it might be easier to just extend the JIT mode (with optimization and the ability to call functions)
The cost of jitting is some offset before starting the event loop (during which things get compiled) and a virtual call per jitted node. Depending on the use-case, this might be reasonably low or unbearably high.
However, you can do in JIT mode anything that you could do in the ROOT interpreter, including calling functions (as long as cling knows about them – you might have to gInterpreter->Declare("#include ...")
and/or load the corresponding libraries via the interpreter).
The design does give me the impression that the JIT mode was meant to be the primary way RDataFrame is used
JIT for quick, possibly interactive exploration, and more verbose, native C++ code for a performant implementation that you code once and use many times.
In any case, thank you for your great feedback, we should definitely think about ways to mitigate the verbosity/redundancy of the native C++ interface.
Cheers,
Enrico