Basic operations on RDataFrame

Hi, I am working with an RDataFrame in pyROOT that is too big to convert into pandas, In general I am okay working with them, but there are some simple operations that I am unable to perform or find a tutorial, or info on how to apply them. They are:

  1. I want to occasionally convert to Numpy just to see how it looks and visualise better. For this I want to convert only a single row of my RDataFrame. How do I access this? RDataFrame[0] does not work, nor RDataFrame.Filter(“Row == 0”).

  2. I have some array<int,n>'s in my dataframe, when using the Sum() command, I am getting a single int as the sum. I want the output of GetValue() to be a vector that is the sum of each individual entry in the array. Can I user define a function to be used for Sum()?

  3. Can I drop some columns, I will only need some at a time, being able to drop some columns would be useful for me. Searching “Drop” in the RDataFrame documentation offers no hits.

Any advice appreciated thanks

__
Please read tips for efficient and successful posting and posting code

ROOT Version: Not Provided
Platform: Not Provided
Compiler: Not Provided


Hi @lgolino ,

Thanks for reaching out. Let me try to answer your questions:

  1. df.AsNumpy.

  2. Sum doesn’t help you in this case, even if you were able to pass a user-defined function (which btw you can, it’s the Reduce method), since you would still get one single return value and not an array of various return values, one per entry. What you really want is a custom object that initializes an std:: vector and then appends to it the sum of all the values of the std::array at your current entry, doing this for all the entries. You could build this helper class and then use the Fill method to interact with the RDF execution.

  3. Snapshot only the columns that you want.

Cheers,
Vincenzo

Thank you for your responses very useful though I have some further questions.

  1. So the dataframe is too big to “AsNumpy” which is why I want to select just a single or a couple of rows before converting to AsNumpy, so how would I select just the first 1-10 rows?

  2. Okay this is good thanks

  3. But what if I have many columns and want to snapshot all but 1 or 2?

Cheers,
Lukas

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.