AsNumpy fails with boolean branches

If I have an RDataFrame with a Bool_t column, AsNumpy fails:

df.AsNumpy()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-10-e856f5516a02> in <module>()
----> 1 df.AsNumpy()

/Applications/root_build/lib/ROOT.pyc in _RDataFrameAsNumpy(df, columns, exclude)
    429         else:
    430             tmp = numpy.empty(len(cpp_reference), dtype=numpy.object)
--> 431             for i, x in enumerate(cpp_reference):
    432                 tmp[i] = x # This creates only the wrapping of the objects and does not copy.
    433             py_arrays[column] = ndarray(tmp, result_ptrs[column])

AttributeError: 'vector<bool>' object has no attribute 'data'

Reproducer below.


First, we create the data frame with an integer column and a boolean column

In [1]: import ROOT

In [2]: df = ROOT.RDataFrame(10).Define('e', 'rdfentry_').Define('b', 'rdfentry_ == 1')

Then we verify the types of the columns:

In [6]: df.Snapshot('temp', 'temp.root')
Out[6]: <ROOT.ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > object at 0x7fe77bdf3fb0>

In [7]: f = ROOT.TFile.Open('temp.root')

In [8]: t = f.Get('temp')

In [9]: for b in t.GetListOfBranches():
   ...:     print b.GetName(), t.GetLeaf(b.GetName()).GetTypeName()
   ...:     
e ULong64_t
b Bool_t

Then we try to convert to numpy, and notice that it works with the ULong64_t column but not the Bool_t column:

In [10]: df.AsNumpy()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-10-e856f5516a02> in <module>()
----> 1 df.AsNumpy()

/Applications/root_build/lib/ROOT.pyc in _RDataFrameAsNumpy(df, columns, exclude)
    429         else:
    430             tmp = numpy.empty(len(cpp_reference), dtype=numpy.object)
--> 431             for i, x in enumerate(cpp_reference):
    432                 tmp[i] = x # This creates only the wrapping of the objects and does not copy.
    433             py_arrays[column] = ndarray(tmp, result_ptrs[column])

AttributeError: 'vector<bool>' object has no attribute 'data'

In [11]: df.AsNumpy(columns=['e'])
Out[11]: {'e': ndarray([0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L], dtype=object)}

ROOT Version: master
Platform: macOS
Compiler: Not Provided


I went ahead and reported it as a JIRA bug here.

in the meantime, you should be able to convert that ROOT file to numpy using go-hep/root2npy.

here are binaries for the latest (v0.22.0 ATM) Go-HEP release:

There is also root_pandas and the possibility of converting the column to an integer:

df = df.Define(‘b_int’, ‘int(b)’)
df.AsNumpy(columns=[‘e’, ‘b_int’])

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.