Home | News | Documentation | Download

How to create a column containing a vector of a custom type?

Hi all,

I’d like to create a column containing a vector of a custom type, but the program fails on execution:

% ./a.out 
input_line_14:4:184: error: use of undeclared identifier 'Product'
namespace __rdf1 {  using products5_type = void /* The type of column "products" (Product) is not known to the interpreter. */; }namespace __rdf1 {  using moreproducts6_type = vector<Product>; }
                                                                                                                                                                                       ^
terminate called after throwing an instance of 'std::runtime_error'
  what():  
An error occurred while jitting. The lines above might indicate the cause of the crash

zsh: abort      ./a.out

If I remove the last Define, i.e. create a column of just the custom type, it works. What am I doing wrong?

Cheers,

Jonas

#include <ROOT/RDataFrame.hxx>
#include <iostream>

class Product
{
public:
   Product() : _x(0), _y(0) {}
   Product(double x, double y) : _x(x), _y(y) {}
   ~Product() {}

   void PrintProduct() {
      std::cout << _x*_y << std::endl;
   }
private:
   double _x, _y;
};
   
int main(int argc, char* argv[])
{
   auto df = ROOT::RDataFrame(10)
      .Define("x", "1.")
      .Define("y", "2.")
      .Define("products", [](double x, double y) {return Product(x, y);}, {"x", "y"})
      .Define("moreproducts", [](double x, double y) {return std::vector<Product>(10, Product(x, y));}, {"x", "y"});

   df.Foreach([](Product& p) {p.PrintProduct();}, {"products"});

   return 0;
}

_ROOT Version: 6.18.00
_Platform: Scientific Linux 6, x86_64
Compiler: g++ 8.2.0


Hi,
this is tricky. The issue is that you have just-in-time-compiled columns in your dataframe (“x” and “y”) and you also have Defined columns that return a type unknown to the interpreter (Product and vector<Product>) and we do not correctly recognize vector<Product> as a type unknown to the interpreter, while we do for just Product.

Possible solutions and workarounds:

  • declare Product to the interpreter, e.g. by copy-pasting its definition in a gInterpreter->Declare call
  • change “x”, “y” to not jit, e.g. Define("x", [] { return 1.; })
  • fix RDF’s logic so that vector<Product> is recognized as an unknown type just as Product is (this last one would be on us of course)

EDIT:
and the reason we don’t recognize that vector<Product> is not a type known to the interpreter is that TClass::GetClass(typeid(vector<Product>)) returns a valid TClass object, while TClass::GetClass(typeid(Product)) returns a nullptr (since Product is not a thing the interpreter knows). TClass tries to be a bit more helpful, but in this case it trips us up.

Thank you for the quick reply!

I just had a similar idea: I simply create a class that basically is vector<Product>, i.e.

class ProductVector : public std::vector<Product>
{
public:
   ProductVector() : std::vector<Product>() {}
   ProductVector(int n, Product p) : std::vector<Product>(n, p) {}
   ~ProductVector() {}
};

and use it instead. I’d say both are acceptable workarounds if you just have one or two of these cases (which is true for me), but for larger analyses, it’d be better if the interpreter could handle this on its own.

Cheers,

Jonas

1 Like

Agreed. This is now a bug report: https://sft.its.cern.ch/jira/browse/ROOT-10273