Value_counts in RDataFrame


Dear experts,

I am trying to implement something analogue to what value_counts() does in pandas with RDataFrame. I have found this previous entry where they ask for precisely this:

I have one minor question: Is there a function which is able to count the occurrence of values within a specific column? Lets say I apply a filter and have column called ‘event_nr’.

And the answer pointed to

For example you can Aggregate column values into a std::map value → count.

and I have tried to use it to obtain the value counts, but I haven’t been able to. Could someone provide an example in PyROOT?

ROOT Version: 6.36.04

Best,

Lidia.


Hi @lidia,

Thank you for your post. Adding @vpadulan and @mczurylo to this thread who can help with your question.

Best,
Lukas

Hi Lidia,

I do not think RDF has a native way to provide what you ask. If you want in Python the stats of a RDF column, you could do

import ROOT
from collections import Counter # <-- this is the python tool to create the counts of values

# A dummy Data Frame
df = ROOT.RDataFrame(24).Define("myVals", "gRandom->Integer(10)")

# Take out a column of the DF
myVals = df.Take["unsigned int"]("myVals")

# Pure Python now, starting from a ROOT object, the column of the DF
myCounts = Counter(myVals)

# Inspect the result!
print(myCounts)

I hope that helps!

D