TDataFrame group by operation example

Hi ROOT,

The new TDataFrame class looks great!

One question I have: is there functionality and/or an example of how to do a “group by” operation using this? I didn’t see anything in the documentation. This is in analogy with the SQL operation or the same thing from python’s pandas or Apache Spark (e.g. http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframegroupby-retains-grouping-columns)

Thanks,
Ryan

Hi Ryan,

glad you like it!
As you can see from the fact it is in the ROOT::Experimental namespace, the class just landed in ROOT. Even if we do not foresee huge changes in its interface, that should be quite stable, we have a quite rich workplan ahead of us in term of new functionalities.
Presently the “group by” functionality is not present in TDataFrame in the same form it is in tools like Pandas. On the other hand there could be ways in which you might achieve already now results which are analogous to the ones you would have if this functionality were already there. The filtering is indeed already quite powerful and performant.
Perhaps we can help you expressing a solution for your problem with the present version of TDataFrame?

Cheers,
D

Hi,
I think we need a GroupBy in TDataFrame too.

I completely agree with dpiparo’s answer, I will just add a bit of context:
differently from Filter and AddColumn, which operate on a per data-point basis, GroupBy requires a full sweep of the data to return a meaningful result that can be used downstream of the functional chain/graph. This simply means it requires extra care and a slightly different approach to be performant and play nice with the rest of the interface.

Cheers,
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.