I’m a big fan of the special functions like MinIf$ and Sum$ of TTreeFormula that let me compute things for the iterations of a single TTree entry. E.g. if I had a vector branch representing “pdgId” of a particle I could Draw Sum$(pdgId==11) to histogram how many electrons there are in each of my events.

Hi @will_cern ,
RDataFrame does not support TTree Draw’s domain-specific language, but it instead supports all C++. This, combined with the fact that per-event arrays are read as RVecs (a vector-like object with several useful helper functions) makes RDF even more flexible that TTree::Draw’s syntax, although sometimes a bit more verbose.

For your particular example, you’d write something like this:

or the (more performant but more verbose) fully typed version:

// I'm using a C++11 lambda function, but a normal C++ function would work as well
auto count11 = [](const RVec<int> &id) { return Sum(id == 1); };
df.Define("count", count11, {"pdgId"}).Histo1D<int>("count");

Ok, this is promising. I certainly need to get my head round using these RVecs then.

Can you help me a bit further by saying how you would do the following: histogram the value of some branch (lets call it pt) for the first electron in each event? In the TTree DSL I would do:

One cool thing is that since RVecs are actual C++ classes, you can play around with them in the ROOT prompt or unit-test little functions that perform the actions you need, outside of the actual event loop or analysis code.

If I interpret the request correctly (probably not, but this should give you an idea anyway), you want something like this:

Explanation: pdgId==11 returns a “mask”, i.e. an RVec that contains only ones and zeros and that can be used to index the other RVec, pt, to extract only the elements for which pdgId==11 (i.e. for which the mask contains a one). Max(...) then returns the maximum value among the selected pt values.

You can play with these at the ROOT prompt, as I showed above.

Nonzero(pdgId==1).front() returns the index of the first element of pdgId that is equal to 11.
If you want the pt at that index, and you are sure there is always at least one, you can do

pt[pdgId==11][0]

(the first pair of square brackets selects the pts where pdgId==11, then the [0] takes the first.

If you are not sure there is always at least one particle with pdgId==11 (in RDF you can be sure, you just have to Filter on that) you can use at(0, 0) instead of [0], i.e.

pt[pdgId==11].at(0, 0)

at(i, x) takes the i-th element if it exists, otherwise it returns x.

Thanks for the help! I think I see now what I did was wrong, because the Take(x,y) method will not return the yth element of x if y is just a number, only if its a vector of indices so I should have done Take(pt,Take(Nonzero(pdgId==11),1)) but indeed this isn’t as pretty or as safe (assumes there’s always >0 electrons) as pt[pdgId==11].at(0,0)

Thanks for all the help, I’ll continue my learning tomorrow!

No problem! I hope it will take much less time to learn to use RVec than the time it took to learn to use TTree::Draw so well

Also we are always looking to expand the set of RVec helpers we have, some interesting ones are DeltaPhi, DeltaR, InvariantMass – feel free to suggest more with a GitHub issue (or even better, a pull request )