Share RNTupleReader field addresses across multiple files

I am interested in sharing the field addresses (std::shared_ptr<T>) used by RNTupleReader across multiple instances of RNTupleReader. This would enable the framework I’m developing to read multiple files in sequence without clunky callbacks for the processors looking at the addresses to update which address they are looking at.

Context: I am working on updaing the LDMX-Software/ldmx-sw Framework to use RNTuple instead of TTree (mock-up framework on Codeberg) and one feature that I would like to carry forward is the ability to read multiple input files while processing. The current TTree Framework does this by maintaining control of all of the branch addresses itself. I would like to avoid implementing this type-erasure myself since it is already implemented in RNTupleModel/REntry.


ROOT Version: 6.38.00
Platform: linuxx8664gcc
Compiler: c++ (Ubuntu 15.2.0-4ubuntu4) 15.2.0 std202302


My naive implementation where I manage an REntry and RNTupleModel fails because when I Clone the model to give it to the RNTupleReader and LoadEntry into my REntry created by the original model, the fModelId does not match (although I suspect the fSchemaId would).

  auto reading_model = ROOT::RNTupleModel::CreateBare();
  {
    std::cout << "spy on first file to get model" << std::endl;
    ROOT::RNTupleDescriptor::RCreateModelOptions create_model_options;
    create_model_options.SetCreateBare(true);
    auto reader = ROOT::RNTupleReader::Open(create_model_options, TUPLENAME, FILEPATHS[0]);
    const auto& desc = reader->GetDescriptor();
    for (const auto& field_desc : desc.GetTopLevelFields()) {
      reading_model->AddField(field_desc.CreateField(desc));
    }
  }

  std::cout << "freeze model and create full-run entry" << std::endl;
  reading_model->Freeze();
  auto entry = reading_model->CreateEntry();

  auto ifile_ptr = entry->GetPtr<int>("ifile");
  auto value = entry->GetPtr<int>("value");

  std::cout << "read both files in sequence" << std::endl;
  for (int ifile{0}; ifile < FILEPATHS.size(); ifile++) {
    auto reader = ROOT::RNTupleReader::Open(reading_model->Clone(), TUPLENAME, FILEPATHS[ifile]);
    for (auto ientry : *reader) {
      reader->LoadEntry(ientry, *entry);
      std::cout << *ifile_ptr << " -> " << *value << std::endl;
    }
  }

produces

writing
spy on first file to get model
freeze model and create full-run entry
read both files in sequence
terminate called after throwing an instance of 'ROOT::RException'
  what():  mismatch between entry and model
At:
  void ROOT::RNTupleReader::LoadEntry(ROOT::NTupleSize_t, ROOT::REntry&) [/opt/root/include/ROOT/RNTupleReader.hxx:241]

Aborted (core dumped)

where the exception is orignating from comparing the model IDs and not the schema IDs.

Is this a bug in the reading implementation? I feel like I should (hypothetically) be able to have multiple copies of a model all with the same schema and since they have the same schema they should be able to LoadEntry into the same REntry but I may be misunderstanding what the “schema ID” represents.

I am aware of RNTupleProcessor::CreateChain which I would like to avoid for two main reasons.

  1. It is still experimental.
  2. My current Framework enables running the same processors with or without an input file to read by using a transient RNTupleModel to host the fields that are being processed. Being able to BindValue<void> the reader’s model addresses to the transient model’s addresses is what allows reading input RNTuples to plug seamlessly into the structure.

I found a solution which isn’t super-nice but does appear to work for my use case. I am leaving this post marked as “unsolved” so I can get expert feedback to find out if this solution is the “best possible” right now (i.e. is this solution safe? is there a better one that requires less user code?)


Instead of trying to share the same REntry across multiple RNTupleReader instances, I instead create new models and REntry for each RNTupleReader instance and copy the addresses from the previous instance to the new one when transitioning between files.

// everything above here is the same as the previous code example
std::cout << "  read both files in sequence" << std::endl;
std::unique_ptr<ROOT::RNTupleReader> reader;
for (int ifile{0}; ifile < FILEPATHS.size(); ifile++) {
  if (ifile == 0) {
    reader = ROOT::RNTupleReader::Open(std::move(reading_model), TUPLENAME, FILEPATHS[ifile]);
  } else {
    std::cout << "creating new model @ " << std::flush;
    auto new_model = ROOT::RNTupleModel::CreateBare();
    const auto& desc = reader->GetDescriptor();
    const auto& reading_field_names{reader->GetModel().GetFieldNames()};
    for (const auto& field_desc : desc.GetTopLevelFields()) {
      if (reading_field_names.find(field_desc.GetFieldName()) != reading_field_names.end()) {
        new_model->AddField(field_desc.CreateField(desc));
      }
    }
    std::cout << new_model.get() << "\ncreating new entry @ " << std::flush;
    auto new_entry = new_model->CreateEntry();
    std::cout << new_entry.get() << "\nbinding values from old entry into new" << std::endl;
    for (auto& value : *entry) {
      new_entry->BindValue<void>(value.GetField().GetFieldName(), value.GetPtr<void>());
    }
    std::cout << "opening next file" << std::endl;
    reader = ROOT::RNTupleReader::Open(
        std::move(new_model),
        TUPLENAME,
        FILEPATHS[ifile]
    );
    std::cout << "updating entry pointer" << std::endl;
    entry.reset(new_entry.release());
  }
  std::cout << "reader @ " << reader.get() << std::endl;
  std::cout << "entry @ " << entry.get() << std::endl;
  std::cout << "ifile_ptr: " << ifile_ptr.get() << " (orig) " << entry->GetPtr<int>("ifile").get() << " (new)" << std::endl;
  std::cout << "value: " << value.get() << " (orig) " << entry->GetPtr<int>("value").get() << " (new)" << std::endl;
  for (auto ientry : *reader) {
    reader->LoadEntry(ientry, *entry);
    std::cout << *ifile_ptr << " -> " << *value << std::endl;
  }
}

I can probably clean up this solution by getting rid of the self-managed REntry since I’m creating a new RNTupleModel each time anyways, but the “transfering of addresses” after the first file is still uncomfortable to me. It feels like I’m sacrificing the benefits of freezing the model, but it might be necessary in order to switch between files and check that the new file still has the fields I’m trying to access.