Does RDataFrame support functionality of TTree special functions

will_cern · January 26, 2021, 4:36pm

I’m a big fan of the special functions like MinIf$ and Sum$ of TTreeFormula that let me compute things for the iterations of a single TTree entry. E.g. if I had a vector branch representing “pdgId” of a particle I could Draw Sum$(pdgId==11) to histogram how many electrons there are in each of my events.

How can I do the same with RDataFrame?

Thanks
Will

eguiraud · January 26, 2021, 4:47pm

Hi @will_cern ,
RDataFrame does not support TTree Draw’s domain-specific language, but it instead supports all C++. This, combined with the fact that per-event arrays are read as RVecs (a vector-like object with several useful helper functions) makes RDF even more flexible that TTree::Draw’s syntax, although sometimes a bit more verbose.

For your particular example, you’d write something like this:

df.Define("count", "Sum(pdgId == 11)").Histo1D("count");

or the (more performant but more verbose) fully typed version:

// I'm using a C++11 lambda function, but a normal C++ function would work as well
auto count11 = [](const RVec<int> &id) { return Sum(id == 1); };
df.Define("count", count11, {"pdgId"}).Histo1D<int>("count");

Cheers,
Enrico

will_cern · January 26, 2021, 4:58pm

Ok, this is promising. I certainly need to get my head round using these RVecs then.

Can you help me a bit further by saying how you would do the following: histogram the value of some branch (lets call it pt) for the first electron in each event? In the TTree DSL I would do:

tree->Draw("pt","pdgId==11&&Iteration$==MinIf$(Iteration$,pdgId==11)")

Thanks for the help!

eguiraud · January 26, 2021, 5:05pm

One cool thing is that since RVecs are actual C++ classes, you can play around with them in the ROOT prompt or unit-test little functions that perform the actions you need, outside of the actual event loop or analysis code.

If I interpret the request correctly (probably not, but this should give you an idea anyway), you want something like this:

root [0] using namespace ROOT;
root [1] RVec<float> pt{1.,2.}
root [2] RVec<int> pdgId{0, 11}
root [3] Max(pt[pdgId==11])
(float) 2.00000f

Explanation: pdgId==11 returns a “mask”, i.e. an RVec that contains only ones and zeros and that can be used to index the other RVec, pt, to extract only the elements for which pdgId==11 (i.e. for which the mask contains a one). Max(...) then returns the maximum value among the selected pt values.

So in RDataFrame you would write:

df.Define("maxElPt", "Max(pt[pdgId==11])").Histo1D("maxElPt");

will_cern · January 26, 2021, 5:20pm

Thanks. I see I’ve a lot to learn, but perhaps I’ve already begu: … I think in fact what I want is:

df.Define("firstElPt","Take(pt, Take(Nonzero(pdgId==11),1))").Histo1D("firstElPt")

Did I do it right? Can I do it better?

will_cern · January 26, 2021, 5:22pm

oo no I think I did it wrong because it will throw exception if there are no “iterations” with pdgId==11. I think this is better:

df.Define("firstElPt","Take(pt, Nonzero(pdgId==11).front())").Histo1D("firstElPt")

Do you agree?

eguiraud · January 26, 2021, 5:30pm

You can play with these at the ROOT prompt, as I showed above.

Nonzero(pdgId==1).front() returns the index of the first element of pdgId that is equal to 11.
If you want the pt at that index, and you are sure there is always at least one, you can do

pt[pdgId==11][0]

(the first pair of square brackets selects the pts where pdgId==11, then the [0] takes the first.

If you are not sure there is always at least one particle with pdgId==11 (in RDF you can be sure, you just have to Filter on that) you can use at(0, 0) instead of [0], i.e.

pt[pdgId==11].at(0, 0)

at(i, x) takes the i-th element if it exists, otherwise it returns x.

will_cern · January 26, 2021, 5:40pm

Thanks for the help! I think I see now what I did was wrong, because the Take(x,y) method will not return the yth element of x if y is just a number, only if its a vector of indices so I should have done Take(pt,Take(Nonzero(pdgId==11),1)) but indeed this isn’t as pretty or as safe (assumes there’s always >0 electrons) as pt[pdgId==11].at(0,0)

Thanks for all the help, I’ll continue my learning tomorrow!

eguiraud · January 26, 2021, 5:43pm

No problem! I hope it will take much less time to learn to use RVec than the time it took to learn to use TTree::Draw so well

Also we are always looking to expand the set of RVec helpers we have, some interesting ones are DeltaPhi, DeltaR, InvariantMass – feel free to suggest more with a GitHub issue (or even better, a pull request )

Cheers,
Enrico

system · February 9, 2021, 5:43pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.