Home | News | Documentation | Download

Cannot pickle AsNumpy output

<class '_rdf_utils.numpy.array'> cannot be pickled using shelve. This is unusual as other ROOT types, e.g., TH1F, can be. If possible, it would be convenient if this feature were incorporated. Demonstration follows.


import ROOT
df = ROOT.RDataFrame(tchain)
ar = df.AsNumpy(columns=['P', 'PT'])
import shelve
shelf = shelve.open('test', 'c')

ar cannot be pickled directly:

shelf['ar'] = ar

results in:

---------------------------------------------------------------------------
PicklingError                             Traceback (most recent call last)
<ipython-input-10-26cba91d0728> in <module>()
----> 1 shelf['ar'] = ar

//anaconda2/lib/python2.7/shelve.pyc in __setitem__(self, key, value)
    130         f = StringIO()
    131         p = Pickler(f, self._protocol)
--> 132         p.dump(value)
    133         self.dict[key] = f.getvalue()
    134 

PicklingError: Can't pickle <class '_rdf_utils.numpy.array'>: attribute lookup _rdf_utils.numpy.array failed

If, however, I wrap _rdf_utils.numpy.array into a true numpy array, it works just fine:

import numpy as np
arreal = {key: np.array(val) for key, val in ar.iteritems()}
shelf['ar'] = arreal
shelf.close()

ROOT Version: 6.19/01
Platform: macOS
Compiler: Not Provided


Hi!

Upfront some technical details: We had to wrap the numpy array because we attach the reference to the according C++ backend to the Python object (see obj.result_ptr). If you do [np.array(val) for _, val in ar.iteritems()], you lose this connection and the underlying C++ object would go out of scope if ar goes out of scope. So in your workaround, take care of not losing ar!

Regarding the direct pickling issues: I’ll have a look at this tomorrow! I think we can simply forward the pickling feature of the underlying numpy array through the wrapper since we don’t want to pickle the reference to the C++ object anyway.

I’ll come back to you!

Best
Stefan

I’ve done a Jira issue here to track the progress: https://sft.its.cern.ch/jira/browse/ROOT-10268

Thank you for looking into this. It’s a pleasure to interact with such a responsive team.

You are welcome! A possible solution is now implemented here:

The issue is actually that pickle breaks if you rename the class object in python. We renamed the class from ndarray to numpy.array to hide the wrapping as good as possible. Probably we will just remove the renaming and pickle works again. We will discuss the best solution in the linked GitHub PR.

Thanks for reporting this issue!

Best
Stefan