Manual schema evolution with I/O rules and branches containing vector<T>

Dear experts,

(The code below is abbreviated for keeping the post readable. I have attached a simple CMake project that has a working and a failing example below)

I am trying to do some non-trivial schema evolution via I/O rules and applying that to branches that store an std::vector<T>, where we reset the branch address for every event, because we want to create independent buffers for every event. Specifically what I want to do is the following. Suppose we have the following two definitions of an MCParticle

namespace v1 {
struct Vector3D { double x, y, z; };

struct ExampleMC {
  double energy;
  v1::Vector3D spin;
};
} // namespace v1

namespace v2 {
struct ExampleMC {
  double energy;
  double helicity;
};
} // namespace v2

I want the schema evolution to “simply” take the value that is stored in v1::ExampleMC::spin::z and put that into v2::ExampleMC::helicity.

To do that I have written the following I/O Rule

  <ioread sourceClass="v1::ExampleMC" 
          targetClass="v2::ExampleMC" 
          target="helicity" 
          source="v1::Vector3D spin" include="iostream">
    <![CDATA[
          std::cout << "I am now evolving " << onfile.spin.z << '\n';
          newObj->helicity = onfile.spin.z;
    ]]>
  </ioread>

Working example without std::vector

I can get this to work, if I write a branch that contains only v1::ExampleMC and read them back as v2::ExampleMC iff I call SetBranchAddress, aka something along the lines of

// omit all the writing and basic setup of reading

// The following two lines are necessary to actually trigger any schema
// evolution hook. Without this, the code specified in the I/O rule will
// not be called.
auto mcp = new v2::ExampleMC();
tree->SetBranchAddress("MCParticles", &mcp); // <-- returns 1

// in event loop we reset the branch address to a new buffer
v2::ExampleMC mcps;
auto pmcps = &mcps;
branch->SetAddress(&pmcps);
tree->GetEntry(i);

This works as expected and the user defined evolution is correctly applied (as demonstrated by the working_example executable. (There is the small question mark on why the evolution code is called twice).

Failing example with std::vector

On the contrary if I store a std::vector<v1::ExampleMC> and read it back as a std::vector<v2::ExampleMC> it looks like the data that is stored on file is not properly arriving at the user specified evolution code, i.e. effectively simply replacing all occurences of v2::ExampleMC with std::vector<v2::ExampleMC> in the above code snippet.

The value that is available from mc.spin.z is always 0. (If I don’t call SetBranchAddress the result looks like some (not entirely arbitrary) re-interpretation of the data on disk and for this specific example I read back double the number of MCParticles that I originally stored.)

I was under the assumption that reading back a vector<v2::ExampleMC> would simply apply the evolution to all elements of the original vector. However, that doesn’t seem to be the case. Is this expected behavior? If yes, is there a way to apply this evolution to a branch that contains a vector<T>?

Full example

schema_evol_reproducer.tar.gz (2.1 KB)

To get to the examples and the full code quickly download the above tarball and run

tar -xf schema_evol_reproducer.tar.gz
cmake -S . -B build
cmake --build build
LD_LIBRARY_PATH=$(pwd)/build:$LD_LIBRARY_PATH ./build/working_example
LD_LIBRARY_PATH=$(pwd)/build:$LD_LIBRARY_PATH ./build/failing_with_vector

ROOT Version: 6.36.00 (but I suspect this behavior is the same for quite some time)
Platform: Ubuntu 24
Compiler: gcc 13


Thank you for the detailed reproducer! I get the same result locally.

For RNTuple, I submitted a patch to fix this case:

For TTree, I didn’t pin down the issue yet. I get schema evolution of the vector case if I store it as a normal object in the file though, and also if I store the branch with split level 0. I’ll keep you posted.

Thanks for the quick update and fix for RNTuple.

Since the RNTuple fix mentions versions, I also tried to explicitly version the classes differently and use the version tag to try and trigger on that as well for the TTree based solution, i.e.

<lcgdict>
  <selection>
    <class name="v1::Vector3D" ClassVersion="1"/>
    <class name="v1::ExampleMC" ClassVersion="1"/>
    <class name="std::vector<v1::ExampleMC>" ClassVersion="1"/>
    <class name="v2::ExampleMC" ClassVersion="2"/>
    <class name="std::vector<v2::ExampleMC>" ClassVersion="2"/>
  </selection>

  <ioread sourceClass="v1::ExampleMC" targetClass="v2::ExampleMC" version="[1]" target="helicity" source="v1::Vector3D spin" include="iostream">
    <![CDATA[
          std::cout << "I am now evolving " << onfile.spin.z << '\n';
          newObj->helicity = onfile.spin.z;
    ]]>
  </ioread>
</lcgdict>

However, that doesn’t seem to change anything for me.

I created a bug report: [tree] missing staged data in I/O rules of split vector<T> --> vector<U> · Issue #19650 · root-project/root · GitHub

I attached to it a simplified version of the reproducer where I excluded a few possible culprits, including the class versions.

Thanks for confirming and opening the issue. Pity that it happens only with splitLevel > 0. Keeping the objects split in the resulting files makes them really handy to use and I suspect users would not be happy if we take that away :wink: