RDataFrame String Filter Question

Dear Expert:

There is an error when I process the following macro. Could you please take a look? Thanks.

using namespace ROOT;

void test() {
  auto file = TFile::Open("ot/test.root", "READ");
  auto dtr = file->Get<TTree>("Detector/Det");
  auto vtr = file->Get<TTree>("Ntp/vDet");
  dtr->BuildIndex("EventID", "TrackID");
  vtr->AddFriend(dtr, "dtr");
  RDataFrame df(*vtr);
  std::cout << df.Filter("(PDGid==-11) && (dtr.PDGid==-11)").Count().GetValue() << std::endl;
}
error: use of undeclared identifier 'dtr'
auto func0(const Float_t var0, const Float_t var1){return (var0==-11) && (dtr.var0==-11)
                                                                          ^
terminate called after throwing an instance of 'std::runtime_error'
  what():  
RDataFrame: An error occurred during just-in-time compilation. The lines above might indicate the cause of the crash
 All RDF objects that have not run an event loop yet should be considered in an invalid state.

Please read tips for efficient and successful posting and posting code

ROOT Version: 6.27/01
Platform: CentOS7.9
Compiler: gcc9.3.0


Hi @Yeung ,

it looks like RDataFrame does not recognize dtr.PDGid as a valid column name. What’s the content of df.GetColumnNames()?

Cheers,
Enrico

  auto colNames = df.GetColumnNames();
  for (const auto& colName : colNames) std::cout << colName << ", ";
  std::cout << std::endl;

gives
Bx, By, Bz, EventID, Ex, Ey, Ez, InitKE, InitT, InitX, InitY, InitZ, PDGid, ParentID, PathLength, PolX, PolY, PolZ, ProperTime, Px, Py, Pz, TrackID, Weight, dtr.Edep, dtr.EventID, dtr.Ntracks, dtr.PDGid, dtr.ParentID, dtr.Px, dtr.Py, dtr.Pz, dtr.TrackID, dtr.VisibleEdep, dtr.Weight, dtr.t, dtr.x, dtr.y, dtr.z, t, x, y, z,

I just switched the filter from

to Filter("(dtr.PDGid==-11) && (PDGid==-11)") and it works. :sweat_smile:

Oooh I know what this is :man_facepalming: it’s a regex issue.

This is now [DF] Wrong regex substitution when generating code to jit · Issue #11002 · root-project/root · GitHub , and for now the workaround is exactly the one you found, i.e. putting the longest column name first when the col part of friend.col is also the name of another column used in that same string expression.

Thank you for the report, we’ll fix this as soon as possible.

Cheers,
Enrico

1 Like

Thank you Enrico.
In the process, I first tried using Alias, i.e.,
std::cout << df.Alias("dtr.PDGid","detPDGid").Count().GetValue() << std::endl;
but still there is an error:

terminate called after throwing an instance of 'std::runtime_error'
  what():  RDataFrame::Alias: cannot define variation "dtr.PDGid". Not a valid C++ variable name.

Are they the same issue?

That’s just because of inverted arguments I think, try with Alias("detPDGid", "dtr.PDGid") (same ordering as with a Define, first the defined column name, then its value).

1 Like