RDataFrame with split and unsplit objects

lindsey · April 8, 2023, 9:58pm

Hello,

I have been trying to learn RDataFrame using a simple test case with a tree of simple Event objects, each of which contains a vector of simple Hit objects.

Using the Tree::Draw interface, I can easily histogram and cut on hit members like so:

t->Draw("events.hits.val", "events.hits.id==5");

This works when the tree is saved at splitlevel=0 or 1 or 2.

Using RDataFrame, I can achieve the same thing like so:

auto df2 = df.Define("mask", "hits.id==5").Define("valAtID", "hits.val[mask]");
df2.Histo1D("valAtID")->Draw()

However, this only works when the tree is saved with splitlevel=2, where all members are visible in branches. For smaller splitlevels, an error is thrown:

input_line_66:2:55: error: no member named 'id' in 'ROOT::VecOps::RVec<Hit>'
auto func0(ROOT::VecOps::RVec<Hit>& var0){return var0.id==5

What would be the appropriate RDataFrame constructions to use to achieve the same thing I can do with TTree::Draw for my example tree in splitlevel 1 or 0?

Thanks!

bellenot · April 12, 2023, 6:23am

I’m sure @eguiraud or @vpadulan or maybe also @pcanal can help

eguiraud · April 13, 2023, 10:55pm

Hi @lindsey ,

sorry for the high latency, both Vincenzo and I have been off.

You can check which columns are available through RDF with df.GetColumnNames(), their type with df.GetColumnType(...) and go from there. It looks like the type of hits for smaller splitlevels is RVec<Hit>, which of course does not have a data member id. In that case you might have to use one of the RVec helper functions to perform the transformation you want, e.g. (not tested, but it should give you an idea)

df.Define("mask", "return Map(hits, [](Hit &h) { return h.id == 5; }")

See also the RVec reference guide.

Cheers,
Enrico

lindsey · April 18, 2023, 1:38am

Yes, you are correct that the smaller splitlevels have type RVec. Here is what I was able to come up with for splitlevel=1 and 0

  //splitlevel=1
  auto df2 = df.Define("mask", "return Map(hits, [](Hit &h) { return h.id == 5; })");
  auto df3 = df2.Define("hitVals", "return Map(hits, [](Hit &h){return h.val;})");
  auto df4 = df3.Define("sigAtID", "return hitVals[mask]");
  auto h = df4.Histo1D("sigAtID");

  //splitlevel=0
  auto getHits = [](Event &e){ROOT::RVec<Hit> out; for(auto h: e.hits) out.push_back(h); return out;};
  auto df2 = df.Define("hits", getHits, {"events"});
  auto df3 = df2.Define("mask", "return Map(hits, [](Hit &h) { return h.id == 5; })");
  auto df4 = df3.Define("hitVals", "return Map(hits, [](Hit &h){return h.val;})");
  auto df5 = df4.Define("sigAtID", "return hitVals[mask]");
  auto h = df5.Histo1D("sigAtID");

Thanks for the pointers.

system · May 2, 2023, 1:38am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.