Does the order in Filter and Define within rdataframe matter?

I am applying some filters and defining some variables. My question is if the order matter concerning the result?

For example:
Is rdf.Filter(good_pt.size()>100, “event_cut”).Define(“good_pt”, pt[cuts]) the same as rdf.Define(“good_pt”, pt[cuts]).Filter(good_pt.size()>100, “event_cut”) ??


Please read tips for efficient and successful posting and posting code

ROOT Version: Not Provided
Platform: Not Provided
Compiler: Not Provided


Hi @imanol097 ,
as the Filter uses the good_pt column, you have to Define it before. With a recent enough ROOT version you will get an “unknown column” error otherwise.

Cheers,
Enrico

And if I do rdf.Define(“good_pt”, pt[cuts]).Filter(good_pt.size()>100, “event_cut”) , then the good_pt variable would have the Filter applied?

I am not sure I understand the question. Filter does not filter variables, it filters events. After the Filter call, only events that satisfy good_pt.size() > 100 will be processed.

My question is if the “good_pt” would have only events with good_pt.size() > 100

good_pt will be calculated for every event, as it’s needed by the Filter. Then the Filter will be evaluated and if good_pt.size() > 100 whatever you put after the Filter will execute, otherwise it will not.

EDIT: more in detail, Defines are evaluated at most once per event, only if something needs that value (in this case, the Filter).

What should I do to get good_pt evaluated only in the events with good_pt.size() > 100?

I’m probably missing something: leaving aside Define and Filter for a second, how can you evaluate good_pt.size() > 100 without first evaluating good_pt? Seems impossible, independently of RDataFrame.

What am I missing?

You are not missing anything. Don’t worry :slight_smile:
Maybe the best option is to define another variable, thanks for your help.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.