RDataFrame unexpected behavior with Define()

Dear ROOT,

Using the RDataFrame in Python, I am getting curious results when using an RVec object. I tried this on the ROOT prompt, and the same issue is shown there. The C++ version:

ROOT::RDataFrame rdf(100);
auto rdf_x = rdf.Define("x", [](){ return gRandom->Rndm(); }).Define("test","ROOT::RVec<bool> out; out.push_back(true); return out;");
auto disp = rdf_x.Display({"x","test"},20);
disp->Print();

This simple code shows the issue. At each next event an additional true is added to the test column, instead of just one.

+-----+------------+------+
| Row | x          | test | 
+-----+------------+------+
| 0   | 0.99974175 | true | 
+-----+------------+------+
| 1   | 0.16290988 | true | 
|     |            | true | 
+-----+------------+------+
| 2   | 0.28261781 | true | 
|     |            | true | 
|     |            | true | 
+-----+------------+------+
| 3   | 0.94720108 | true | 
|     |            | true | 
|     |            | true | 
|     |            | true | 
+-----+------------+------+
| 4   | 0.23165654 | true | 
|     |            | true | 
|     |            | true | 
|     |            | true | 
|     |            | true | 
+-----+------------+------+
| 5   | 0.48497361 | true | 
|     |            | true | 
|     |            | true | 
|     |            | true | 
|     |            | true | 
|     |            | true | 
+-----+------------+------+

If I replace the RVec<bool> with an RVec<int> there is no problem, I get only one 1 in the column. The same if I replace RVec<bool> with std::vector<int>. However, if I replace the RVec<bool> with std::vector<bool>, the root system crashes completely, with the error:

/Users/maurik/root/master/include/ROOT/RDF/RDisplay.hxx:132:57: error: no viable conversion from '__bit_iterator<std::vector<bool, std::allocator<bool> >, false>' to 'const void *'
<< ROOT::Internal::RDF::PrettyPrintAddr(&(collection[i])) << ");";

The work-around is easy, just use RVec<int> and avoid bool altogether, but for compiled libraries that work with bool types and using the PyROOT interface which converts vectors to RVec object, this seems to not be quite as easy.

Am I doing something I am not supposed to do?
Is there a fix or better solution?

Thanks,
Maurik

ROOT Version: Git master: 6.29/01
Platform: MacOS M1
Compiler: clang 13.0.0


Hello @maurik ,

thank you very much for your report. This is a bug in the Display of RVec<bool> (the actual data processing is fine, it’s just the representation of the data by Display that is broken).

Here’s another reproducer that also shows that the processing is ok:

#include <ROOT/RDataFrame.hxx>
#include <ROOT/RVec.hxx>

int main() {
  ROOT::RDataFrame df(100);
  auto df2 = df.Define("test", [] { return ROOT::RVec<bool>(1u, true); });

  // works as expected
  const auto count =
      df2.Filter([](const ROOT::RVec<bool> &vb) { return vb.size() == 1; },
                 {"test"})
          .Count()
          .GetValue();
  std::cout << count << '\n';

  // works as expected
  auto x = df2.Take<ROOT::RVecB>("test");
  for (const auto &e : x) {
    if (e.size() != 1u)
      std::cout << "size is greater than 1!\n";
  }
  std::cout << "check 2 done\n";

  // has a bug
  auto disp = df2.Display({"test"}, 3);
  disp->Print();
}

I opened a GitHub issue for this problem and a corresponding pull request with a fix – it will be included in the next ROOT patch release.

Cheers,
Enrico

The fix is not included in the v6.28 branch (upcoming v6.28.04 release) and the master branch (future v6.30). If you get a chance to try the nightly builds from tomorrow onwards, they should also contain the fix.

Please let us know in case you encounter any further issue.

Cheers,
Enrico

Thanks @eguiraud Enrico for the quick fix, and good to know the actual data is okay.
I did a pull request on the master branch and build it, which showed that your fix worked.
Best wishes,
Maurik

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.