Redefine RDataFrame vector by masking

I’m having some issues when trying to Redefine() a column in an RDataFrame.

At the moment, I have a dataframe with two columns, each a vector of floats of the same length:

RVec<float> jet_eta
RVec<float> jet_pt

I’d like to remove entries from those vectors where abs(jet_eta) < 2.5. Based on a similar post “How to slim the branch array in RDataFrame” I figured I could do something like

RDataFrame df = RDataFrame("mytree", "myfile.root");
df.Define("mask", "abs(jet_eta) < 2.5")\
  .Redefine("jet_pt", "jet_pt[mask]")\
  .Redefine("jet_eta", "jet_eta[mask]");

Unfortunately, this results in a runtime error:

terminate called after throwing an instance of 'std::runtime_error'
  what():  RDataFrame: type mismatch: column "jet_pt" is being used as vector<float> but the Define or Vary node advertises it as ROOT::VecOps::RVec<float>

I’ve also tried doing this with a lambda expression

auto mask_vector = [](const ROOT::RVecF &vector, const ROOT::RVecI &mask) {
  return vector[mask];
};
RDataFrame df = RDataFrame("mytree", "myfile.root");
df.Define("mask", "abs(jet_eta) < 2.5")\
  .Redefine("jet_pt", mask_vector, {"jet_pt", "mask"});

but this fails with the same runtime error as above.

Define()-ing new columns works fine – am I stuck with that? Or is there another way to get Redefine() to work in this scenario?

Thanks!


ROOT Version: v6.28.00
Platform: CentOS 7
Compiler: gcc12


Dear @ryan.quinn ,

Thanks for reaching out. Everything works fine for me in the situation you describe. I wrote this reproducer which should be a standalone representation of your situation:

#include <ROOT/RDataFrame.hxx>
#include <ROOT/RVec.hxx>
#include <ROOT/RLogger.hxx>
#include <ROOT/RDFHelpers.hxx>
#include <TRandom.h>

#include <algorithm>
#include <cstddef>

// Uncomment this to get verbose logging from RDF
// auto verbosity = ROOT::Experimental::RLogScopedVerbosity(ROOT::Detail::RDF::RDFLogChannel(),
// ROOT::Experimental::ELogLevel::kInfo);

ROOT::RVecF generate_values(std::size_t size, double lower, double upper)
{
   ROOT::RVecF res(size);
   std::generate(std::begin(res), std::end(res), [lower, upper]() { return gRandom->Uniform(lower, upper); });
   return res;
}

void writetree()
{
   ROOT::RDataFrame df{10};
   df.Define("nevents", []() -> std::size_t { return gRandom->Integer(5) + 1; })
      .Define("jet_pt", [](std::size_t n) { return generate_values(n, 5, 25); }, {"nevents"})
      .Define("jet_eta", [](std::size_t n) { return generate_values(n, 0, 5); }, {"nevents"})
      .Snapshot<std::size_t, ROOT::RVecF, ROOT::RVecF>("test_tree", "test_tree.root", {"nevents", "jet_pt", "jet_eta"});
}

void analysis_lambda()
{
   ROOT::RDataFrame df{"test_tree", "test_tree.root"};
   auto mask_vector = [](const ROOT::RVecF &vector, const ROOT::RVecI &mask) { return vector[mask]; };
   auto df1 = df.Define("mask", "abs(jet_eta) < 2.5").Redefine("jet_pt", mask_vector, {"jet_pt", "mask"});
   df1.Display({"nevents", "jet_pt", "jet_eta"}, 10)->Print();
}

void analysis_nolambda()
{
   ROOT::RDataFrame df{"test_tree", "test_tree.root"};
   auto df1 = df.Define("mask", "abs(jet_eta) < 2.5").Redefine("jet_pt", "jet_pt[mask]");
   df1.Display({"nevents", "jet_pt", "jet_eta"}, 10)->Print();
}

int main()
{
   writetree();
   analysis_nolambda();
}

And it works both with the C++ lambda or with the jitted string. I wonder whether your input dataset has some other type than ROOT::RVecF or there is some similar mismatch. Let me know if you try my snippet and if it works for you.

Cheers,
Vincenzo

Thanks for your response. Your snippet works as expected!

Turns out I was using another function that expected std::vector<float> in a Define() call later on. I forgot to comment out that call when debugging, and thus that line must have been where the error was occurring.

Changing the function definition to expect ROOT::RVecF fixed the problem, and the jitted string method works fine.

Thanks!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.