Use RDataFrame with nested vectors (like vector<vector<float>>) without creating additional dictionaries


Please read tips for efficient and successful posting and posting code

_ROOT Version: 6.26/02, 6.28/06
_Platform: GNU/Linux (CentOS Linux release 7.9.2009)
_Compiler: GCC 11.2.0, GCC 13.2.0


Hi,

My question is quite simple and yet I cannot find a definitive answer: is it possible to work with nested vectors in branches using RDataFrame (create TTree, read TTree) without any additional dictionaries?

Here’s my example macro testTree.C (also attached below):

void testTree() {
  ROOT::RDataFrame d(10);
  int i(1);
  vector<string> definedNames = {"b1", "b2", "b3"};
  
  d.Define("b1", [&i]() { int j = i; ++i; return j; })
    .Define("b2", [&i]() {
        vector<float> v(i);
        for (auto &k:v)
          k = gRandom->Rndm();
        return v;
      })
     .Define("b3", [&i]() {
        vector< vector<float> > vv;
        for (auto &v1:vv) {
          v1.resize(i);
          for (auto &k:v1)
            k = gRandom->Rndm();
        }
        return vv;
      })
    .Snapshot("tr", "testTree.root", definedNames);
}

testTree.C (571 Bytes)

If I run this macro in ROOT as follows, it works:

gInterpreter->GenerateDictionary("vector<vector<float > >")
.x testTree.C

but if I run just root -l -b -q testTree.C it fails with the following message:

Error in <TTree::Branch>: The class requested (vector<vector<float> >) for the branch "b3" is an instance of an stl collection and does not have a compiled CollectionProxy. Please generate the dictionary for this collection (vector<vector<float> >) to avoid to write corrupted data.
RDataFrame::Run: event loop was interrupted
terminate called after throwing an instance of 'std::logic_error'
  what():  Trying to insert a null branch address.

Yet, when I open resulted file with RDataFrame and check it out (for example ROOT::RDataFrame d("tr", "testTree.root"); d.Describe()), it works even without additional dictionary.

Is there a way to modify this code so it would simply run in the interpreter without requiring additional dictionaries (i.e. just root -l -b -q testTree.C)? Or is it always mandatory when working with nested vectors in RDataFrame to create them?

Cheers,
Peter

Maybe @pcanal has another solution, but can’t you simply add gInterpreter->GenerateDictionary("vector<vector<float>>"); at the beginning of the testTree() function? Something like this:

void testTree()
{
  gInterpreter->GenerateDictionary("vector<vector<float>>");
  ROOT::RDataFrame d(10);
  [...]

It works, but it will leave 4 additional files (_.cxx, _.so, _.pcm, _.d) which might be a problem in case of several dictionaries. So I also added a last line that deletes all generated files after the snapshot is done:

[...]
    .Snapshot("tr", "testTree.root", definedNames);
  gSystem->Exec("rm AutoDict_*");
}

Still, I hope there’s more elegant way to do this considering cases, where TTree has some nested vectors (or just 2,3-dimentional arrays), might be quite common - especially when working with data formats from the experiments.

OK, so let’s see if @pcanal or @Axel have a more elegant solution