Generalizing TDataFrame::Reduce

Currently, TDataFrame::Reduce expects a function with signature T(T, T), where T is the type of the column that is being operated on. IMHO, this can be relaxed to at least U(U, T), where the new type U can be almost anything, or even better to BinaryOperation as defined in std::accumulate.

If in addition you replace the column type T with a tuple of types, Reduce can completely replace Foreach{Slot}.

What are your thoughts?

1 Like

Hi,
thanks for looking into TDF!

Regarding letting Reduce take a callable with signature U(U,T) and let it do something similar to std::accumulate: the plan was in fact to add a separate Accumulate action (in order to keep names as expressive as possible). Accumulate is not there (yet?), though, because it is awkward to manage in the multi-thread case: users would need to supply both the U(U,T) lambda and a U(U,U) lambda that TDF could use to merge the results of each thread - or threads would have to share the accumulator, which has performance implications. Since an elegant solution did not come to mind Accumulate stayed in the back of the todo list.

And of course since Accumulate is not there we cannot implement Reduce and/or Foreach in terms of it.

If/when accumulate will be there, I think we would want to keep all actions (Foreach, Reduce and Accumulate) in the interface so that users can express intent via the method they use, even though we might implement Reduce/Foreach in terms of Accumulate (would be harder to do the same with ForeachSlot). If this is at all desirable would come down to performance considerations I guess - Accumulate might add some overhead wrt Foreach that we might not want…but this is utter speculation at this point :slight_smile:

Hope this makes sense.
Cheers,
Enrico

Hi Enrico,

Thanks for the comprehensive response!

After some thought, I actually think that std::reduce is a better std reference, since there you also need commutativity and associativity, unlike std::accumulate. The binary operation there is a FunctionObject with the exact property that you described and more. So basically this is a solved problem, a functor would be the way to manage the different signatures in the complex cases.

Additionally, Accumulate seems to me should be reserved for the most general binary ops where you cannot guarantee neither associativity nor commutativity. This of course implies that you need to process in a single thread and also preserve order.

Hi,
if I understand correctly what you are suggesting now is that we make the requirements on the callable passed to Reduce equal to those on the callable passed to std::reduce, correct?

I am not sure what the advantages would be, could you elaborate a bit on the motivation?

Thanks for your interest!
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.