In general, RDataFrame should be able to read any C++ type. But I wonder how the arrays were saved into the tree. Do you know the type of the objects in the branch? Did you try something like
rdf = rdf.Define(..., "myArray[1][2][3]", ...)
?
If you want to do complicated things with these arrays, you can either write full C++ functions (ROOT.gInterpreter.Declare( ..... ); these are usually quite fast) or you need some more pythonic helpers. For something like this, we should wait for @etejedor or @swunsch to be back from vacation. Maybe they have an idea.
A better way to frame my question would be whether it is possible to get 3 nestedness for my 3D array using RDataFrame.
>>> rdf[0][0][0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'int' object is not subscriptable
Something like the following does not work
>>> y = rdf.Define("test", "i1[0][0][0]")
input_line_189:2:13: error: subscripted value is not an array, pointer, or vector
return i1[0][0][0]
~~~~~^~
input_line_190:2:13: error: subscripted value is not an array, pointer, or vector
return i1[0][0][0]
~~~~~^~
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
Exception: Template method resolution failed:
ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Define(experimental::basic_string_view<char,char_traits<char> > name, experimental::basic_string_view<char,char_traits<char> > expression) =>
Exception: Cannot interpret the following expression:
i1[0][0][0]
Make sure it is valid C++. (C++ exception)
ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Define(experimental::basic_string_view<char,char_traits<char> > name, experimental::basic_string_view<char,char_traits<char> > expression) =>
Exception: Cannot interpret the following expression:
i1[0][0][0]
Make sure it is valid C++. (C++ exception)
Maybe I am doing it wrong?
Using it as a 1D array like this does not throw an error -
Ok, obviously, the branches in this attempt don’t represent a type that can be indexed in multiple dimensions. Can you use TTree::Print() on the input data so we can find out how the arrays were saved? Is this already a 3D array or is it still the 2D from the first example?
Note that it only makes sense to process the tree with RDataFrame if you have to do work with these arrays that happens “inside each event”. If you just want to obtain a large-dimensional array spanning over all events, RDF is probably not the right tool.
Just to double check with @pcanal: Philippe, it should be possible to read 3D array branches from Python, shouldn’t it?
Reik, my guess is that if you try to read a 3D array from Python (in the Python loop form, just like you did with the 2D array) you will get all the contents in a flat array that you can then reshape with numpy. But I don’t think you can define a 3D branch with RDataFrame just like you tried to do (@eguiraud can correct me if I am wrong).
It is the 2D array from the first example (written using the C code in the top post).
Is it because the array in this case is 2D and I am trying to define a 3D branch? Because defining a 2D array doesn’t work either -
>>> y = rdf.Define("test", "i1[4][5]")
input_line_179:2:13: error: subscripted value is not an array, pointer, or vector
return i1[4][5]
~~~~~^~
input_line_180:2:13: error: subscripted value is not an array, pointer, or vector
return i1[4][5]
~~~~~^~
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
Exception: Template method resolution failed:
ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Define(experimental::basic_string_view<char,char_traits<char> > name, experimental::basic_string_view<char,char_traits<char> > expression) =>
Exception: Cannot interpret the following expression:
i1[4][5]
Make sure it is valid C++. (C++ exception)
ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Define(experimental::basic_string_view<char,char_traits<char> > name, experimental::basic_string_view<char,char_traits<char> > expression) =>
Exception: Cannot interpret the following expression:
i1[4][5]
Make sure it is valid C++. (C++ exception)
Or is it because redefining multidimensional arrays using RDataFrame is not possible?
Ok, I believe we have a clearer idea now what’s going on:
When you create a 2D c-style array, it actually gets written into the tree as a 1D array of size n_x*n_y = 4*5 in your case. That’s why when you retrieve it from python with numpy, you see 20 numbers. You would have to reshape this now to reflect the [4][5] structure.
The same problem happens inside RDataFrame nodes (C++): The 2D array comes back as a 1D array. To access it at the correct location, you have to access at i1[x*5+y] (equivalent of i1[x][y])
where the constant is the size of the rightmost dimension. That’s because multi-dimensional C-style arrays are just unrolled into a long 1D array.