Problems with Filter in RDataFrame

Dear Experts,

I want to apply some selections on the PV positions (x,y,z) using filter in RDataFrame.
The variable in the DecayTree is called PVX[nPV]/F
however when I do:

h=df.Filter("PVX>0.3").Histo1D(var)
h.GetEntries()

I get this error

error: static_assert failed due to requirement ‘std::is_convertible<ROOT::VecOps::RVec,bool>::value’ “filter
expression returns a type that is not convertible to bool”

I tried to fix it by using:

sel_pvx ="(for (auto x :PVX){if(x>0.3) return true;}return false;)"
or 
sel_pvx ="(for (int i=0;i<PVX;i++){ if (PVX[i]>0.3)return true;}return false;)"
h=df.Filter(sel_pvx).Histo1D(var)
h.GetEntries()

but it didn’t work.
is there anything else I can do to fix it ?

Thanks

Sara

Our RDataFrame expert @eguiraud is on vacation, but I’m sure he will give you some hints when he’s back

Hi Sara,

PVX[nPV]/F means that your variable is an array with a size nPV
While df.Filter() loops over the events. Thus it is unclear whether you want:

  • to discard an event if ANY element of the array PVX[i]>0.3?
  • to discard an event if ALL element of the array PVX[i]>0.3?
    What I assume you want is to:
  • Don’t discard any events, but discard elements in the array if the element PVX[i]>0.3?

Then you can do something like this:

h=df.Define("good_PVX", "PVX[PVX>0.3]") \
       .Histo1D("good_PVX")

This would create a new variable array in your RDataFrame called good_PVX with elements which are PVX[i]>0.3

If you plan to use more variables related to the same physical object, like PVY, PVZ be careful to apply the same selection on all of them e.g. more general code would look something like this:

h=df.Define("selection", "PVX>0.3 && PVY > 0.5 && PVZ < 0.4") \
       .Define("goodPVX", "PVX[selection]") \
       .Define("goodPVY", "PVY[selection]") \
       .Define("goodPVZ", "PVZ[selection]") \
       .Define("radius", "sqrt(goodPVX*goodPVX + goodPVY*goodPVY + goodPVZ*goodPVZ)") \
       .Histo1D("radius")

you can see tutorial on how RDataFrame works with arrays e.g. here1 or here2.

It maybe a bit clumsy to rename all the columns related to the same physical object if you have a lot of them, but root team currently looking on how to improve this in the recent versions…

cheers,
Bohdan

2 Likes

Dear Bohdan,
Thank u very much for your reply what u suggested is exactly what I need.
Do you have any idea what to do if I need a Histogram of pT or p with the selections above ?
I understand that

h=df.Define("good_PVX", "PVX[PVX>0.3]").Histo1D("good_PVX")

gives u a Histogram of PVX with those selections.
what to do if we want Histo1D(“ple_PT”) instead, with the same selections?
PS: ple_PT is not an array .

I am not sure what would that selection mean for the ple_PT
Could you give more context, what do these variables mean and how are they connected?

Sorry for the confusion.
ple_PT is the transverse momentum of the particle. for my case pi_PT.

what I want to do is to draw a histogram of the transverse momentum distribution of the events that only have a PVX >0.3

I know it will be something like :

h=df.Filter(Cuts).Histo1D("pi_PT")

my only issue is how to implement the selections that u suggested in Filter(Cuts)

Thanks

If you have per event:

  • only one ple_PT
  • many PVX

you need to be more specific on what do you mean by:
“the events that only have a PVX >0.3”, as your events have many PVXes.
Do you want to select events that have:

  • at least one PVX > 0.3?
  • all must be PVX > 0.3?

for both cases you could define a custom c++ functions which would do the selection.
Here is an example:

ROOT.gInterpreter.Declare('''
using namespace ROOT::VecOps;

bool selection1(RVec<double> PVX){
    // if at least one element is larger than 0.3 - pass the event
    for (int i=0; i < PVX.size() ; i++){
        if( PVX[i] > 0.3 ) return true;
    }
    return false;
}

bool selection2(RVec<double> PVX){
    // if all elements are larger than 0.3 - pass the event
    for (int i=0; i < PVX.size() ; i++){
        if( PVX[i] < 0.3 ) return false;
    }
    return true;
}
''')
h1=df.Filter("selection1(PVX)").Histo1D("ple_PT")
h2=df.Filter("selection2(PVX)").Histo1D("ple_PT")

I believe you could do it as one line using C++ lambda functions imidiatly inside Filter() as well.

Hope it helps

cheers,
Bohdan

1 Like

Sorry, I saw your fix attempt only now.

Does this work for you?

sel_pvx ="[](){ for (auto x :PVX){ if(x>0.3) return true; } return false;}"
h=df.Filter(sel_pvx).Histo1D(var)
h.GetEntries()

if not try:

sel_pvx ="[&PVX](){ for (auto x :PVX){ if(x>0.3) return true; } return false;}" 
1 Like

Thank you @FoxWise for these elaborated replies!

@sara_sellam note that you also have logical operators that you can apply on collections, i.e. All and Any to obtain a single boolean value, see e.g.:

https://root.cern/doc/master/vo003__LogicalOperations_8C.html

So if you want to keep only the events that have at least one PVX>0.3, you can do:

h=df.Filter("Any(PVX>0.3)").Histo1D(var)
2 Likes

Thank you all for your help.

It seems that for my case all what I need to use is Any() but nevertheless what @FoxWise suggested is very useful and I can use it at some point . :slight_smile:

Sara

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.