Is GroupBy operation natively implemented in RDataFrame?


Please read tips for efficient and successful posting and posting code

ROOT Version: 6.20
Platform: Mac OS
Compiler: Clang

Dear ROOT developers,

I am currently analyzing some TTrees employing RDataFrame. Their structure is as follows:

particle info_1 info_2 event
1 1
2 1
3 1
10 2
11 2

My goal is to perform a GroupBy operation grouping on the event column. Please, keep in mind that rows are unordered (e.g. particle 3 could belong to event 2 and particle 4 to event 1).

My current approach is to create a std::map where the key is the event, and the value is a user-defined struct. This map is updated, after applying some filters and definitions, using the Foreach function of RDataFrame. Is it possible to perform a GroupBy natively in RDataFrame?
Moreover, I am considering exploiting multithreading in my analysis, so having such function in RDF API would allow me to take advantage of the implicit MT capability of ROOT.

Best regards,
Loris


Hi Loris,
unfortunately GroupBy is not implemented in RDataFrame, but you can implement it as a custom action, also for the MT case. Returning a std::unordered_map seems like a good way to do it at a first glance.

You could even propose your implementation for inclusion in ROOT inside ROOT/RDFHelpers.hxx.

Here is a tutorial that shows how to implement a custom action in practice.

I hope this helps!
Cheers,
Enrico

Hi Enrico,

I see, thank you very much. I will try to polish a bit my code and have a look at it!

Cheers,
Loris

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.