TTree filtering at PyROOT - RDataFrame

I am moving from root_numpy to RDataFrame and i am having some difficulties in the TTree filtering.

I have a lot of variables in my trees, some of then are event variables and other are track variables, and i need to filter both kind of variables at the same time.

My problem is filtering tracks within an event.

How can i filter track variables, as eta or pt, easily in PyROOT ? For example selecting tracks with pT > 700MeV ?

Currently i am using C++ lambda functions like:

            "for (auto x :" + str(variable) + "){ if (x<=" + str(max) +  "&& x>" + str(min) + ") return true; } return false;"

But this is not so readable as i would like, is there any other solution ?

__Please read tips for efficient and successful posting and posting code

_ROOT Version: 6.24
Platform: Not Provided
Compiler: Not Provided


Welcome to the ROOT Forum! @eguiraud, our RDataFrame expert is currently on vacation, but maybe @etejedor can have an idea, since it’s PyROOT…

Hello,

One thing you can do, if expressing everything in a string in Filter is less readable, is to define a function beforehand:

ROOT.gInterpreter.Declare("""
bool my_filter_function(branch_type1 branch_name1, ...) {
   // your code here
}
""")

and then, in Filter, you do:

rdf.Filter("my_filter_function(branch_name1, ...)")

Alternatively, you can also put your C++ functions in a compiled library and load it, like it is explained here:

Cheers,
Enric

Hi,

You can also use RVec in the JITted C++ (see the example in the documentation), I think Track_pt > 700. && Track_pt < X constructs an RVec<bool> that you can use to index any other Track_XYZ (for how to scale to a large number of branches the discussion Is there better way to filter array branches than defining new columns in RDataFrame? is a nice overview of the possibilities and limitations).

Cheers,
Pieter

@pieterdavid, Could you please give some example of the PyROOT syntax?
I’ve tried it but is not working.

I have seen some examples where syntax like .Filter(“eta>2”) works. How can be this possible?

If you are using a string to define your cut that you pass as an argument to Filter, there is no difference between Python and C++ (the expression in the string is just in time compiled as C++ code in both cases).

What example are you referring to?

Filter("eta>0") just works if eta is a scalar branch. If eta is an array/vector branch, the right filter would be Filter("All(eta>0)"), since the result of eta>0 would be an RVec<bool> as explained by @pieterdavid - that would filter out the events with all etas <=0. Here’s a tutorial that shows the logical operations you can do:

https://root.cern/doc/master/vo003__LogicalOperations_8C.html

Thank you for your time!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.