Hello,
I am trying to use RDataFrame for an analysis where I need to apply a selection to the RDF by a function apply_selection
. The complexity of the selection means I have to define some columns.
Although when returning the rdf_baseline
, I do not want these columns as different selections call for different columns to be defined and pollute the column list and causes errors with possible redefinitions down the line.
def apply_selection(rdf):
rdf_baseline = rdf.Define("Track_st1", "Track_z<1200")
rdf_baseline = rdf.Define("Track_st2", "Track_z>1200")
rdf_baseline = rdf.Define("Track_r", "Track_x*Track_x + Track_y*Track_y")
rdf_baseline = rdf.Filter("Sum(Track_st1)>=2 && Sum(Track_r[Track_st1] < 100)>=2", "NTrackSt1>=2")
rdf_baseline = rdf.Filter("Sum(Track_st1)>=2", "NTrackSt1>=2")
# More conditions based on nontrivial defines...
report = rdf_baseline.Report()
return rdf_baseline, report
The other alternatives, without dropping columns, is to store the new definitions as strings and pass around f-strings. Or do everything in a single function with an exceptionally large signature. Just checking if there are any other ways around this.
Thanks.