Creating a RDataFrame with a reduced number of columns

I would like to create a RDataFrame instance that only contains the “columns” or “branches” that I specify, and not all the branches that are found inside my “AnalysisTree”.

I try the following, by specifying the third argument of the RDataFrame constructor. However, I don’t get the effect I would expect. I would expect that GetColumnNames would return just the name of column “optics_efficiency” that I specified in the third argument.

root [2] ROOT::RDataFrame d("AnalysisTree", "BabyIAXO_TrueWolterMicromegasTest_00368.root", {"optics_efficiency"} )
(ROOT::RDataFrame &) A data frame built on top of the AnalysisTree dataset.
Default branch: optics_efficiency
root [3] d.GetColumnNames()
(ROOT::RDF::ColumnNames_t) { "afterOptics_R", "afterOptics_posX", "afterOptics_posY", "afterOptics_posZ", "axionPhoton_coherenceLength", "axionPhoton_fieldAverage", "axionPhoton_probability", "axionPhoton_transmission", "boreExitGate_transmission", "eventID", "final_R", "final_energy", "final_phiAngle", "final_posX", "final_posY", "final_posZ", "final_thetaAngle", "initial_R", "initial_energy", "initial_phiAngle", "initial_posX", "initial_posY", "initial_posZ", "initial_thetaAngle", "magnetEntrance_R", "magnetEntrance_posX", "magnetEntrance_posY", "magnetEntrance_posZ", "magnetExit_R", "magnetExit_posX", "magnetExit_posY", "magnetExit_posZ", "offset_R", "offset_posX", "offset_posY", "offset_posZ", "optics_efficiency", "runOrigin", "subEventID", "subEventTag", "subRunOrigin", "timeStamp", "window_transmission" }
root [4] 

I am guessing right? Or this should be done some other way?

Thanks!

Hi @Javier_Galan ,

the third argument is a default column list, see ROOT: ROOT::RDataFrame Class Reference , and indeed it does not do what you expected.

There is no feature to do what you ask other than creating another dataset with just the columns you want, e.g. via df.Snapshot("tree", "file.root", listOfColumnsYouWantToKeep).

Cheers,
Enrico

P.S.
If you are worried about the runtime cost of having more columns than you need in the dataset, unless you actually use a column it’s not read in, so there is no cost.

If you need to re-define the values of some columns, you can do it even if they already exist with Redefine.

Ok, thanks! Exactly what I was looking for.

I was also looking for a way to save the RDataFrame back into a standard TTree, there is a way for that?

Thanks again!

Not worried, just willing to simplify the tree for human friendly reasons.

Yes, that’s exactly what Snapshot does :smiley: See the users guide here: ROOT: ROOT::RDataFrame Class Reference

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.