SegBreak when Filter() on certain element for variable-size vector branch

Hi,
I just want to add some background about what’s going on: RDataFrame is fundamentally different from TTree::Draw, by design: the former only uses pure C++ as Filter conditions and Define expressions – the latter employs a domain-specific-language that performs several under-the-hood transformations on behalf of the user.

When you write "lepton_pt[0] > 10” in TTree::Draw, you’re not writing C++: that’s a TTree::Draw condition that is parsed and is translated into code that also adds a check for lepton_pt.size() > 0 for you.

When you write the same string in a RDF Filter, that’s exactly the C++ that is executed. In fact, as per the users guide, df.Filter("lepton_pt[0] > 0") is functionally equivalent to df.Filter([](const RVec<float> &lepton_pt) { return lepton_pt[0] > 0; }, {"x"}).

and if lepton_pt does not have a 0-th entry, that will typically result in a segfault.

@Pnine suggested the two ways to make the code safe: adding a Filter to only proceed with computation if lepton_pt.size() > 0, or take the full array but take advantage of short-circuiting to avoid accessing non-existing elements: lepton_pt.size() > 0 ? lepton_pt[0] : 0 (if 0 makes sense as a fallback value). The latter should be slightly faster than the former, as there is one Filter less to invoke, but the difference should be minimal.

Cheers,
Enrico

1 Like