RDataFrame: Filter TChain with TTree containing std::vector<struct>

Hello Experts,

I am trying out RDataFrame to use possibly benefit from the implicit multi-threading. I am struggling either struggling to get the syntax right, or this not yet possible with RDF.

Input:
A bunch of *.root files containing a TTree named “qc” with a single branch named “trackQC” of type std::vector<o2::trd::TrackQC >. (The type is probably irrelevant here, suffice it to say it is a custom struct with a proper LinkDef, will link to it below 1.).

I can digest this of course with a TTree like this:

  // Chain and Branch
  TChain chain("qc");
  chain.Add("trdQC*.root");
  std::vector<o2::trd::TrackQC> qc, *qcPtr{&qc};
  chain.SetBranchAddress("trackQC", &qcPtr);

  // Loop
  for (int iEntry = 0; iEntry < chain.GetEntries(); ++iEntry) {
    chain.GetEntry(
      iEntry);
    for (const auto& q : qc) {
      // Cuts some omitted
      if (q.type != TRACK_TYPE)
        continue; // type 0 = TPC-TRD, type 1 = ITS-TPC-TRD

     // do some stuff ...
      }
    }

I tried to convert this to use RDF, like this:


using namespace ROOT;
using QCVec = std::vector<o2::trd::TrackQC>;

void dEdxTPCDF() {
  // Ingest
  ROOT::EnableImplicitMT();
  RDataFrame df{"qc", "trdQC*.root"}; // this works fine

  // Filter
  auto cutType = [](const QCVec &qc) {
    int i = 0;
    std::vector<bool> good(qc.size());
    for (auto &q : qc) {
      if (q.type == 0) {
        good[i] = true;
      } else {
        good[i] = false;
      }
      ++i;
    }
    return good;
  };
  auto filterType = df.Filter(cutType, {"trackQC"}, "Type"); // this does not

  // Report...
  }
}

I realize that ‘cutType’ should actually return a boolean not a std::vector, the compiler gave this much already away.
However, I am not sure how to this otherwise.
Is this even possible with RDF?

Any help is appreciated.


  1. in O2/Detectors/TRD/qc/include/TRDQC/Tracking.h AliceO2 (cannot post links)

Hi @f3sch ,

and welcome to the ROOT forum!

Filter selects whole events. In order to select some entries in a vector per event, you want Define, e.g. Define("good_tracks", [](const std::vector<...> tracks) { return ...; }).

In order to avoid copying the inner objects you can also Define a mask of indices that you can use to index your original vector. This should be simpler if you use the vector<T>s as RVec<T>s (it should be transparent).

See also our tutorials, the ones starting with df10... show some example analyses.

I hope this helps!
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.