Does RDataFrame support tree branches with 2d arrays?

Dear ROOT-ers,

Exploring the capabilities of RDataFrame I have run into the following question: can I refer to tree branches containing two-dimensional statically-allocated C++ arrays (e.g. Float_t x[2][2]) when defining the new variables, filtering etc.?

The following script illustrates the problem I have encountered.

#include <ROOT/RDataFrame.hxx>
#include <TFile.h>
#include <TTree.h>

void test(){

  // generate a tree with 2d array branch
  Float_t array2d[2][2];
  TFile * f_out = new TFile("test.root", "RECREATE");
  TTree * T = new TTree("h1", "");
  T->Branch("array2d", array2d, "array2d[2][2]/F");
  for(int i=0;i<10;++i){
    T->Fill();
  }
  T->Write();
  f_out->Close();

  // try to process with RDataFrame
  ROOT::RDataFrame d("h1", "test.root");
  d.Filter("array2d[0][0] > 0.");
}

If I try to process a tree with a branch containing a 2d array like in this example, accessing the array’s elements is not recognized when parsing the definitions and filters:

input_line_46:2:18: error: subscripted value is not an array, pointer, or vector
return array2d[0][0] > 0.
       ~~~~~~~~~~^~
terminate called after throwing an instance of 'std::runtime_error'
  what():  Cannot interpret the following expression:
array2d[0][0] > 0.

Make sure it is valid C++.

Trying to understand the problem, I have discovered that TTreeReader also does not handle such branches, so that when trying to generate a TSelector from the tree created by the above script, I got the following warning (unless generating the legacy selector version):

Warning in <AddReader>: Ingored branch array2d because type is unknown.

Am I doing something wrong or is it a limitation of RDataFrame and TTreeReader?

From a quick check it seems that I can access the array as a 1d array, i.e. replacing array2d[i][j] with array2d[i*Ncolumns+j]. Of course this should work with statically-allocated multidimentional C++ arrays, but do you know if the continuous memory layout is preserved when reading from the TTree in a file so that such a workaround would be reliable?

Many thanks,
Alek


_ROOT Version: 6.14/04, 6.15/01 (git master)
_Platform: x86_64 Debian GNU/Linux
_Compiler: g++ (Debian 7.3.0-27) 7.3.0


Hi,
RDataFrame does not support explicit handling of multidimensional arrays (yet).
You should be able to read all the values as an unidimensional array, i.e. "array2d[0]" should give you the same value as "array2d[0][0]" and "array2d[3]" should give you the same value as "array2d[1][1]". Reading 2d arrays as 1d arrays is not tested though, so it’s not guaranteed to work.

The underlying reason for the lack of explicit support of multidimensional arrays is that, as you found out, TTreeReader does not support them and that’s what we use to do I/O.

We meant to address this limitation in the next release, see (and follow, if you’d like) this jira ticket, although it now looks like we might not make it in time.
The more users complain about it the higher the priority of the feature though :smiley: so far this is the second time ever than anyone requests the feature, the first time being during the ROOT users’ workshop last month.

Sorry I don’t have a better answer (for now).
Cheers,
Enrico

Dear Enrico,

Thank you for the comprehensive explanation and no need to be sorry - I understand RDataFrame is still under development.

This is not a very severe problem as often multidimensional arrays cab be easily replaced with other structures, but it would be great to support them because of certain legacy data structures (e.g. the ones I work with date back to the Fortran days where the use of multi-dimensional arrays was more prominent in the absence of objects).

I have subscribed to watch the jira ticket and will be happy to hear any news, even if not in the nearest release!

Cheers,
Alek

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.