Retroactively handling multiple Class versions in ROOT files

I am surprised that they are. See (roottest/root/io/datamodelevolution/stl/readFile.C at be467f624cb2ba613baf78d67475fe7b898bae56 · root-project/roottest · GitHub).

And if they are they can left an ‘empty’ string.

You can detect the layout by looking at:

auto si = (TStreamerInfo*)file->GetStreamerInfoCache()->FindObject(MyClass);
if ( si->GetCheckSum() == value_for_2015) {
    is_2015 = true;
...

The CheckSum is a unique identifier of the layout.

Ah ok, if i specify a version in the rule (which isn’t necessary for me) I got the following output:

WARNING: IO rule for class MyClass_new - required parameter is missing: target
The following rule has been omitted:
   read sourceClass="MyClass" versions="1" targetClass="MyClass_new"

Omitting the version works, however. :slight_smile: Great!

I gave the checksum comparison a try; I generated a file with the current library, then rebuilt a new library using the exact same files, only I changed the name of MyClass to MyClass_new, and i added the additional mapping line to the linkdef file.

However, it seems that when i do
gROOT->GetClass("MyClass_new")->GetCheckSum()
and compare with

TStreamerInfo* sis=(TStreamerInfo*)_file0->GetStreamerInfoCache()->FindObject("MyClass");
sis->GetCheckSum()

the results are different, even though the content (members and methods) have not changed?

Right … we are indeed not supporting ‘partial’ renaming …

only I changed the name

right … the name of the class is part of the checksum … but good news this is not what you need.

Instead you simply need to open an old file (for each of the old layouts) and get the checksum for ‘MyClass’ there.

Of course, that would work.
Brilliant, thanks so much for the help Philippe

This took some time to implement, for a variety of reasons - making a class hierarchy, wrappers to handle the missing methods etc.
Unfortunately now that i’ve tried it, i’ve run into the same issue as this thread: Renamed class and error "trying to read an emulated class"
but only for classes which contain TObjArrays of other renamed classes.
I have the corresponding lines for all renamed classes in my linkdef file, and everything works without any problems for renamed classes that don’t contain TObjArrays.
But for renamed classes that contain TObjArrays of another class, when I try to retrieve an entry from the branch, I get messages like the following:

Error in <TBufferFile::ReadObject>: trying to read an emulated class (MyOtherClass) to store in a compiled pointer (TObject)

To clarify, the situation is like this:

class MyClassA{
private:
TObjArray* theEvents;
...
public:
MyClassB* GetEvent(int i){ return (MyClassB*)(*theEvents[i]); }
}

class MyClassB{
...
}

now I have renamed MyClassA to MyClassA_ver0 and MyClassB to MyClassB_ver0.
I can instantiate and use both new classes, and I can set the address of a tree branch holding a MyClassA object to the address of a pointer to a MyClassA_ver0 object. So far so good. But on trying to do retrieval, i receive ReadObject errors about MyClassB.

Is there something extra that needs to be done to support TObjArrays?

Keeping the thread alive: any ideas on this? :bowing_man:

If having the class MyClassB still around and with a dictionary is not a good option (and it sounds like it isn’t in your case) then the next ‘best’ solution is to hijack TObjArray. Ie. Create a new class TObjArraySilentStreaming that inherits from TObjArray and has a custom streamer. In the custom-streamer you would copy paster the content of TObjArray::Streamer and replace the call to

obj = (TObject*) b.ReadObjectAny(TObject::Class());

with

static TClassRef MyClassB_ver0_cl("MyClassB_ver0");
obj = (TObject*) b.ReadObjectAny(MyClassB_ver0_cl);

and add an alias/renaming rule from TObjArray to TObjArraySilentStreaming.

If you need access to the TFile to determine which version to use, you can use b.GetParent()

Cheers,
Philippe.

Thanks for the suggestion Philippe. Before I commit further work to implement this, I’d like to double check if this will work. Sadly the situation gets more complicated. Without wanting to keep moving the goalposts, there are multiple layers of nested classes here. So I should have been more thorough in explaining that I have, for example:

  • MyClassA that contains a TObjArray of MyClassB
  • MyClassB which itself contains five TClonesArrays of different classes (MyClassC, MyClassD… each of which is at least simple).

So my implementation would then need to make a custom MyObjArrayClassA_v0 class, with a streamer that specifies the object contained in the TObjArray member of MyClass_v0 are of type MyClassB_v0.
Of course, I would also have files where the contained objects are of version MyClassB_ver1, etc. So I could several such MyObjArrayClassA_vx classes, each with a
#pragma read SourceClass="TObjArray" targetClass="MyObjArrayClassA_vx"
line in the linkdef.

Similarly for the nested classes within MyClassB, I would have a MyTClonesArrayB_v0 class with a streamer that specified the TClonesArray in MyClassB_v0 should contain objects of type MyClassC_v0.

And, through the magic of ROOT, the appropriate versions in each case would be used? At the top level, for example, I set the address of a branch containing a MyClassA object to the address of a pointer to a MyClassA_v0 object. The class MyClassA is itself not defined, but there are multiple #pragma lines that link it to the various MyClassA_v0, MyClassA_v1 classes, and somehow the correct streamer is used to read the object from disk.
Can i suppose this same magic is possible with the TObjArray/TClonesArrays? :confounded: I hope very much so, otherwise I fear this may have been a long winded exercise in futility!

I don’t think that is necessary, a single one (Per class, supporting multiple version) should do and relying on b.GetParent() to detect which file (version) you are reading and then passing the ‘right’ TClass.

Overloading TClonesArray may not be necessary on whether they are split or not.

If they are split, then setting the TClonesArray inner type after the creation of the outer-object but before calling SetBranchAddress might be enough.

If they are not split, then a single derived TClonesArrays should be enough since the TClonesArray knows its content type and can then using the right conversion/alternate.

Cheers,
Philippe.

OK, this is sounding much more positive. the TClonesArrays are not split. So in this case I would only need:

  • a single MyObjArrayMyClassA class, with an overridden streamer. The overridden streamer would then need to determine which specific class was contained in the TObjArray, and pass an appropriate TClassRef. For determining the class contained, it could use b.GetParent() to obtain a pointer to the TObject … which you say gives access to the TFile… I can’t see how i can go from TObject to TFile, but if the returned TObject is a MyClassA_vx object, then yes I can use that to determine the correct type.

  • a single overridden MyTClonesArray class? I’m not sure what you mean on this; if the TClonesArray knows its content type, what derived class is needed? I presume if i need one derived class, i need to override the streamer; does it also need to determine it’s contained class via it’s parent TObject?

I can’t see how i can go from TObject to TFile ,

dynamic_cast<TFile*>( b.GetParent() );

The TClonesArray’s Streamer is reading the ‘content’ type for the file and writing that into the in memory TClonesArray (i.e. it will always go back to be ‘MyClassB’), so you would need a class derived from TClonesArray that overrides the ::Streamer and change that behavior.

Now, there is likely another alternative … maybe … instead using class derived from TClonesArray, it might be enough to replace the type of the datamember (that are currently TClonesArray) by std::vector<MyClassB_ver0>.

Cheers,
Philippe.

Ah so the returned Parent is a TFile pointer, that works too.

If the TClonesArray streamer is able to read it’s contents as the appropriate contained class, it seems like that could work without any extra effort? After all, when retrieving elements from the TClonesArray the parent class is casting them to the correct pointer type anyway.
i.e. in MyClassA_v1 there is a Getter which is doing:
MyClassB_v1* GetB(Int i){ return (MyClassB_v1*)fClonesArray[i]; }

So the TClonesArray::TStreamer populates fClonesArray with ‘MyClassB’ objects, but of course ‘MyClassB’ is in fact ‘MyClassB_v1’ (simply by a different name) so the cast will work fine. So the question is simply whether ROOT will complain about creating a TClonesArray of objects of a class with no in-memory definition…

If that isn’t okay and an overridden Streamer is needed… I suppose I could have a TSteamer which modifies the line TClass *cl = TClass::GetClass(classv) and instead determines what type of object it should be reading, in the same manner as the modified TObjArray.

If i understand correctly the difference here is that TObjArray only allows access to it’s owning parent file to determine the contained class, while knowing nothing about what class type it contains. The TClonesArray knows what type it should be storing (presumably classv is the name of the class).
I don’t see how this leads to needing one derived TObjArray per class, and only one TClonesArray, though. In both cases I’d be bypassing the TClass reference derivation and determining the type myself. And especially in the case of TObjArray, the array itself has no information about what type of object it contains - so why would I define multiple TObjArray overrides?

The vector solution seems much simpler, so i’ll certainly give that a try first.

, but of course ‘MyClassB’ is in fact ‘MyClassB_v1’ (simply by a different name)

Is it always? I thought the original issue is that you had different class layout with the same version number … (if the wasn’t any class layout change then we might be able to employ different tricks).

ROOT will complain about creating a TClonesArray of objects of a class with no in-memory definition…

It probably won’t … but the ‘type’ of the instances with MyClassB (in particular the virtual table will not be the same as MyClassB_vers0 it will ‘just’ be the one from the closest compiled base).

TObjArray only allows access to it’s owning parent file to determine the contained class

It won’t able you determine the content just the version numbers. A TObjArray can contains an heterogeneous set of object and thus it knows the actual type only after reading it (and/or it could pick at the byte stream).

A TClonesArray on the other hand can only contains one kind of objects and records that information in one of its data member and thus you can determine at run-time before reading the objects which kind of object needs to be read.

Cheers,
Philippe.

Is it always? I thought the original issue is that you had different class layout with the same version number

Sorry, that was merely an example; MyClassB could be a MyClassB_v1, or MyClassB_v2 etc, i merely meant that when MyClassA_v1 is loaded and the TObjArray is populated, there is an appropriate definition of the contained class in memory (differing by name. e.g. MyClassB_v1 instead of MyClassB … and vtable, which i hadn’t thought of), and that the parent MyClassA::GetTObjArrayElement method involves a pointer cast from TObject to the specific MyClassB_vx anyway, so i wondered if it could ‘just work’.

Thanks for the clarification on the TObjArray/TClonesArray, but i’m not sure it answers my confusion on why multiple TObjArray derivations are needed, implementation wise. In both cases I’m overriding the Streamer methods to manually determine the contained type, so whether the normal automatic type detection is active before or after reading is irrelevant; in neither case is it being used.

Only if the in-memory layout are indeed the same and that the virtual table are essentially the same.

For the TObjArray, you can’t know at run-time which set of classes to use (whether it is MyClassB_ver_* or MyClassC_ver_*).

Cheers,
Philippe.

PS. If the onfile-layout of all the versions are the same, I am still confused why those complications are needed.

Hi Philippe,
I gave this a go; I have defined a TObjArray_wrapper class which inherits from TObjArray and overrides the Streamer method with one that determines the class type and passes a suitable TClassRef as you described. I changed the TObjArray member of MyClassA_v0 to be a TObjArray_wrapper member. Finally I also added

#pragma link C++ class TObjArray_wrapper-;
#pragma read sourceClass="TObjArray" targetClass="TObjArray_wrapper";

to my LinkDef file.
However, i get the following errors when compiling the root dictionary:

./include/TObjArray_wrapper.hh:21:10: error: class member cannot be redeclared
    void Streamer(TBuffer &b);
         ^
./include/TObjArray_wrapper.hh:20:5: note: previous declaration is here
    ClassDef(TObjArray_wrapper,1)
    ^
.../root-6.06.08/build/include/Rtypes.h:255:4: note: expanded from macro 'ClassDef'
   _ClassDef_(name,id,virtual,)   \
   ^
.../root-6.06.08/build/include/Rtypes.h:248:25: note: expanded from macro '_ClassDef_'
   virtual_keyword void Streamer(TBuffer&) overrd; \

If I omit the ClassDef line in TObjArray_wrapper, I instead get

Error in <TObjArray_wrapper>: TObjArray_wrapper inherits from TObject but does not have its own ClassDef

The ClassDef is necessary. The Streamer is declared in the ClassDef and thus you commit your own from the class declaration (i.e delete line number 21).

Cheers,
Philippe.

Ah right, I see, thankyou.
That compiles without any errors, but unfortunately I still get the same errors are before;

Error in <TBufferFile::ReadObject>: trying to read an emulated class (MyClassB) to store in a compiled pointer (TObject)

I’ve put some print statements in the TObjArray_wrapper::Streamer method, but they don’t seem to be printed, so it looks as if my custom streamer isn’t being used.

Did you use

#pragma link C++ class  TObjArray_wrapper-; // The - request a custom streamer.

?