Another use case for RDataFrame "groupby" like feature

Hi @arizzi ,

Yes, threads take a contiguous chunk of events and process them in order. Not only, but the chunks will always begin and end at TTree cluster boundaries, so in principle if the K correlated events never (or vanishingly often) cross TTree cluster boundaries you are good.

This is Have rdfentry_ represent the global entry number in the TChain even in MT runs · Issue #12190 · root-project/root · GitHub, it requires some work but it’s in our plans (the last comment in the conversation proposes a development roadmap).

I see how in general this would be solved well by a group-by operation. There was some more discussion of that at Groupby in RDataFrame . The problem, of course, is making it efficient for larger-than-memory datasets. I do not think ROOT has that in the plan of work at the moment.

Cheers,
Enrico