How to efficiently remove zero entries from DataFrame rows

Terklton · September 23, 2024, 7:26am

Hi all,

I have a dataframe with columns “pp” and “pt”. each row of these columns is a vector and some of the values are 0. I’m trying to find a way to efficiently remove the zeros. I’m using Jupyter Notebooks and Python.
Example of the structure (where 1 stands for just one row of the dataframe):

1 | 5.40445f |
| | 0.339785f |
| | 0.0130158f |
| | 0.00000f

And I want it to be:
1 | 5.40445f |
| | 0.339785f |
| | 0.0130158f |

Ultimately I just want a histogram of pp vs pt but with all the zeros removed.

Thank you!
ROOT Version: JupyROOT 6.30/04
Platform: Jupyter

vpadulan · September 23, 2024, 7:40am

Dear @Terklton ,

Thanks for reaching out to the forum! This is easily achieved in RDataFrame:

import ROOT

# Simply create an example dataset with one row
# using the same values from your post
df = (
    ROOT.RDataFrame(1)
        .Define("vec_withzeros",
                "ROOT::RVecF{5.40445f, 0.339785f, 0.0130158f, 0.00000f}")
)

# Exploit the functionalities of ROOT.RVec, which naturally
# exposes collection data with a familiar numpy-like syntax
df = df.Define("vec_nozeros", "vec_withzeros[vec_withzeros != 0.f]")

df.Display(("vec_withzeros", "vec_nozeros")).Print()

Cheers,
Vincenzo

Terklton · September 23, 2024, 7:48am

Thank you, exactly what I was looking for!