Hi @malexand ,
Thank you for reporting your issue. Let me add a bit more context. Your workaround is indeed the right way of calling the __reduce__ syntax on any python object you would want to pickle. You need to supply a function that in turn returns an object of your class for the machinery to work.
That being said, it shouldn’t be needed in general imho. If I try to remove the __reduce__ method from the class I always get a segfault:
vpadulan@fedorathinkpad-T550 [~]: python
Python 3.8.9 (default, Apr 6 2021, 00:00:00)
[GCC 10.2.1 20201125 (Red Hat 10.2.1-9)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ROOT
>>> import pickle
>>> class Derived(ROOT.TH1F):
>>> obj = Derived()
*** Break *** segmentation violation
#7 0x00007f6aeb43a8ac in TClass::StreamerTObject(TClass const*, void*, TBuffer&, TClass const*) () from /home/vpadulan/Programs/rootproject/rootinstall/v6-24-00/lib/libCore.so
#8 0x00007f6aeaaba5a6 in TBufferFile::WriteObjectClass(void const*, TClass const*, bool) () from /home/vpadulan/Programs/rootproject/rootinstall/v6-24-00/lib/libRIO.so
#9 0x00007f6aeaac1b74 in TBufferIO::WriteObjectAny(void const*, TClass const*, bool) () from /home/vpadulan/Programs/rootproject/rootinstall/v6-24-00/lib/libRIO.so
#10 0x00007f6add1ca55f in op_reduce(CPyCppyy::CPPInstance*, _object*) () from /home/vpadulan/Programs/rootproject/rootinstall/v6-24-00/lib/libROOTPythonizations3_8.so
Which highlights a different issue. By the way, what Python version are you using?
Let me clarify more. With simple python classes your initial version of __reduce__ might work since the python builtin __class__ method can be used as a callable. But in this case the class is more complex, depends on cppyy (as the error suggests) and ROOT . So the builtin __class__ is not enough to return all the minimal information needed to construct an object of that kind (as the pickle documentation states).
I still believe it should not be needed in this case though.
Thanks for the explanation! I’m using python 3.8.6. I’ve also tested on my laptop: OSX 10.14.6, ROOT 6.22/06, python 3.9.1, with the same results. I also see the segfault (on both machines) when trying to pickle without defining __reduce__. In ROOT <= v6.20 I saw that if I didn’t define __reduce__ then it would just pickle the base class (TH1F in this example), presumably using the definition of __reduce__ from the base class.
Normally, when I want to pickle something in a non-standard way I would define __getstate__ and __setstate__, but I’ve found that these are ignored for classes deriving from ROOT classes, and only defining __reduce__ works. Again, maybe this is because the base class defines __reduce__ and that takes precedence over any definition of __getstate__/__setstate__.
After some investigation, this has to do with how cross inheritance is handled in cppyy (and therefore in the new PyROOT).
In cppyy, when a Python class inherits from a C++ base class, a C++ wrapper class that inherits from the C++ base class is jitted. The proxied C++ object that is created for the derived Python class is an object of the jitted wrapper class. The trouble comes when trying to serialize an object of that jitted class: it crashes.
This explains why it’s possible to pickle an object from a ROOT class, but not an object whose class inherits from a ROOT class. By redefining __reduce__ in the derived Python class, you prevent the default __reduce__ to kick in and try ROOT serialization, so it’s good as workaround for now for cross-inheritance classes.
Also, I found out that the unpickling of derived Python-C++ objects in the old PyROOT didn’t really work, since what you got back when unpickling was an object of the base (C++) class, not of the derived (Python) class.
I made this PR to provide a better error message in case of pickling cross-inheritance objects:
Lack of support of I/O for jitted classes in ROOT makes it hard to provide a reasonable generic solution (dictionaries would need to be generated before serializing and be made available for deserialization too). Therefore, in the message it is suggested to implement a custom __reduce__ method, i.e. what Michael already did. Returning a callable (plus some arguments) in __reduce__ prevents an attempt to serialize the cross-inheritance object, which is what we want (the object will be constructed and its C++ wrapper jitted by the callable during deserialization).