Share RNTupleReader field addresses across multiple files

tomeichlersmith · July 2, 2026, 4:15pm

I am interested in sharing the field addresses (std::shared_ptr<T>) used by RNTupleReader across multiple instances of RNTupleReader. This would enable the framework I’m developing to read multiple files in sequence without clunky callbacks for the processors looking at the addresses to update which address they are looking at.

Context: I am working on updaing the LDMX-Software/ldmx-sw Framework to use RNTuple instead of TTree (mock-up framework on Codeberg) and one feature that I would like to carry forward is the ability to read multiple input files while processing. The current TTree Framework does this by maintaining control of all of the branch addresses itself. I would like to avoid implementing this type-erasure myself since it is already implemented in RNTupleModel/REntry.

ROOT Version: 6.38.00
Platform: linuxx8664gcc
Compiler: c++ (Ubuntu 15.2.0-4ubuntu4) 15.2.0 std202302

My naive implementation where I manage an REntry and RNTupleModel fails because when I Clone the model to give it to the RNTupleReader and LoadEntry into my REntry created by the original model, the fModelId does not match (although the fSchemaId does match).

  auto reading_model = ROOT::RNTupleModel::CreateBare();
  {
    std::cout << "spy on first file to get model" << std::endl;
    ROOT::RNTupleDescriptor::RCreateModelOptions create_model_options;
    create_model_options.SetCreateBare(true);
    auto reader = ROOT::RNTupleReader::Open(create_model_options, TUPLENAME, FILEPATHS[0]);
    const auto& desc = reader->GetDescriptor();
    for (const auto& field_desc : desc.GetTopLevelFields()) {
      reading_model->AddField(field_desc.CreateField(desc));
    }
  }

  std::cout << "freeze model and create full-run entry" << std::endl;
  reading_model->Freeze();
  auto entry = reading_model->CreateEntry();

  auto ifile_ptr = entry->GetPtr<int>("ifile");
  auto value = entry->GetPtr<int>("value");

  std::cout << "read both files in sequence" << std::endl;
  for (int ifile{0}; ifile < FILEPATHS.size(); ifile++) {
    auto reader = ROOT::RNTupleReader::Open(reading_model->Clone(), TUPLENAME, FILEPATHS[ifile]);
    std::cout << "reader->GetModel().GetModelId(): " << reader->GetModel().GetModelId() << std::endl;
    std::cout << "reader->GetModel().GetSchemaId(): " << reader->GetModel().GetSchemaId() << std::endl;
    std::cout << "entry->GetModelId(): " << entry->GetModelId() << std::endl;
    std::cout << "entry->GetSchemaId(): " << entry->GetSchemaId() << std::endl;

    for (auto ientry : *reader) {
      reader->LoadEntry(ientry, *entry);
      std::cout << *ifile_ptr << " -> " << *value << std::endl;
    }
  }

produces

writing
spy on first file to get model
self-managed REntry
freeze model and create full-run entry
read both files in sequence
reader->GetModel().GetModelId(): 9
reader->GetModel().GetSchemaId(): 7
entry->GetModelId(): 7
entry->GetSchemaId(): 7
failed: mismatch between entry and model
At:
  void ROOT::RNTupleReader::LoadEntry(ROOT::NTupleSize_t, ROOT::REntry&) [/opt/root/include/ROOT/RNTupleReader.hxx:241]

where the exception is orignating from comparing the model IDs and not the schema IDs.

github.com/root-project/root

tree/ntuple/inc/ROOT/RNTupleReader.hxx

b7bf155d9


      
          void LoadEntry(ROOT::NTupleSize_t index, ROOT::REntry &entry)
          {
             if (R__unlikely(entry.GetModelId() != fModel->GetModelId()))
                throw RException(R__FAIL("mismatch between entry and model"));
          
             entry.Read(index);
          }

Is this a bug in the reading implementation? I feel like I should (hypothetically) be able to have multiple copies of a model all with the same schema and since they have the same schema they should be able to LoadEntry into the same REntry but I may be misunderstanding what the “schema ID” represents.

I am aware of RNTupleProcessor::CreateChain which I would like to avoid for two main reasons.

It is still experimental.
My current Framework enables running the same processors with or without an input file to read by using a transient RNTupleModel to host the fields that are being processed. Being able to BindValue<void> the reader’s model addresses to the transient model’s addresses is what allows reading input RNTuples to plug seamlessly into the structure.

tomeichlersmith · July 2, 2026, 5:02pm

I found a solution which isn’t super-nice but does appear to work for my use case. I am leaving this post marked as “unsolved” so I can get expert feedback to find out if this solution is the “best possible” right now (i.e. is this solution safe? is there a better one that requires less user code?)

Instead of trying to share the same REntry across multiple RNTupleReader instances, I instead create new models and REntry for each RNTupleReader instance and copy the addresses from the previous instance to the new one when transitioning between files.

// everything above here is the same as the previous code example
std::cout << "  read both files in sequence" << std::endl;
std::unique_ptr<ROOT::RNTupleReader> reader;
for (int ifile{0}; ifile < FILEPATHS.size(); ifile++) {
  if (ifile == 0) {
    reader = ROOT::RNTupleReader::Open(std::move(reading_model), TUPLENAME, FILEPATHS[ifile]);
  } else {
    std::cout << "creating new model @ " << std::flush;
    auto new_model = ROOT::RNTupleModel::CreateBare();
    const auto& desc = reader->GetDescriptor();
    const auto& reading_field_names{reader->GetModel().GetFieldNames()};
    for (const auto& field_desc : desc.GetTopLevelFields()) {
      if (reading_field_names.find(field_desc.GetFieldName()) != reading_field_names.end()) {
        new_model->AddField(field_desc.CreateField(desc));
      }
    }
    std::cout << new_model.get() << "\ncreating new entry @ " << std::flush;
    auto new_entry = new_model->CreateEntry();
    std::cout << new_entry.get() << "\nbinding values from old entry into new" << std::endl;
    for (auto& value : *entry) {
      new_entry->BindValue<void>(value.GetField().GetFieldName(), value.GetPtr<void>());
    }
    std::cout << "opening next file" << std::endl;
    reader = ROOT::RNTupleReader::Open(
        std::move(new_model),
        TUPLENAME,
        FILEPATHS[ifile]
    );
    std::cout << "updating entry pointer" << std::endl;
    entry.reset(new_entry.release());
  }
  std::cout << "reader @ " << reader.get() << std::endl;
  std::cout << "entry @ " << entry.get() << std::endl;
  std::cout << "ifile_ptr: " << ifile_ptr.get() << " (orig) " << entry->GetPtr<int>("ifile").get() << " (new)" << std::endl;
  std::cout << "value: " << value.get() << " (orig) " << entry->GetPtr<int>("value").get() << " (new)" << std::endl;
  for (auto ientry : *reader) {
    reader->LoadEntry(ientry, *entry);
    std::cout << *ifile_ptr << " -> " << *value << std::endl;
  }
}

I can probably clean up this solution by getting rid of the self-managed REntry since I’m creating a new RNTupleModel each time anyways, but the “transfering of addresses” after the first file is still uncomfortable to me. It feels like I’m sacrificing the benefits of freezing the model, but it might be necessary in order to switch between files and check that the new file still has the fields I’m trying to access.

couet · July 3, 2026, 6:13am

I guess @jblomer can help.

jblomer · July 3, 2026, 8:32am

An entry cannot be shared between readers/models (also not cloned ones). The model owns fields, which in turn own columns that get connected to a particular page source. And the entry references fields of its originating model.

The schema ID is used to share field tokens between model clones. So in your use case, you’d need to rebind the pointers when moving from file to file. You can avoid the potentially costly repeated string lookup, however, and instead create field tokens from the frozen model once. You’d then store the pairs of field token and shared pointer in order to rebind them to an entry of another model clone.

@florine Perhaps you can comment to the API question/issue regarding the RNTupleProcessor.

florine · July 3, 2026, 9:49am

Hi @tomeichlersmith, thanks for your post and sharing your use case.

I am aware of RNTupleProcessor::CreateChain which I would like to avoid for two main reasons.

It is still experimental.

I can understand this is a reason to decide not use it. We are planning to move the RNTupleProcessor interfaces out of Experimental at the end of this year. On the other hand, however, before we do so, we greatly benefit from community feedback on the usability of these interfaces for their use case. I understand not wanting to commit to an interface that can still change, but I would encourage trying it out and sharing your experience with us, especially if you find points that could still be improved.

My current Framework enables running the same processors with or without an input file to read by using a transient RNTupleModel to host the fields that are being processed. Being able to BindValue<void> the reader’s model addresses to the transient model’s addresses is what allows reading input RNTuples to plug seamlessly into the structure.

The RNTupleProcessor::RequestField method allows passing an existing pointer address, which if I understand correctly from your description should meet your requirements.

Please let me know if you have any additional questions on the RNTupleProcessor or RNTuple in general, I’d be happy to answer them.

tomeichlersmith · July 3, 2026, 2:17pm

Thank you your replies! I will look into providing an address to RNTupleProcessor::RequestField and pre-converting the field names into field tokens to ease the transition between files.

tomeichlersmith · July 3, 2026, 3:22pm

Alright, I am copying my solution here for anyone curious. I did not pre-convert into field tokens or use RNTupleProcessor due to the complexity of the framework I’m developing.

Why not? for those curious

Some further explanation if you’re curious, please look at the mock-up framework if you want to look at the code details.

I do not want the processors to have to know if they are reading from a file or not, so I centrally-manage the fields that are available within a “transient model” that is not connected to a source or sink. Reading from an RNTuple just binds values into the transient model for later observation; however, I want to be able to ignore some fields (Ignore fields when reading an RNTuple) in order to improve performance in some use cases of the framework (e.g. just filling histograms of a few columns). I also want to be able to read all of the fields in other use cases of the framework (e.g. a larger reconstruction pass where the framework copies all of the columns from the input file into the output file so I can delete the original input file that doesn’t have the new columns added).

I don’t think this use case should cause any design changes to the RNTuple code - it is fairly complicated and multi-featured so it makes sense to me that I have to write more complicated code to support this feature set.

This solution is able to support the possibility of pre-defining a set of fields that need to be read (and are the only ones to be read, i.e. I can incorporate “ignoring” fields like from Ignore fields when reading an RNTuple) at the cost of code complexity and a decent number of string-comparisons when transitioning between files.

// spy on the first file to get a model for reading
// this is also where I implement the ignoring of certain fields if applicable
auto reading_model = ROOT::RNTupleModel::Create();
{
  ROOT::RNTupleDescriptor::RCreateModelOptions create_model_options;
  create_model_options.SetCreateBare(true);
  auto reader = ROOT::RNTupleReader::Open(create_model_options, TUPLENAME, FILEPATHS[0]);
  const auto& desc = reader->GetDescriptor();
  for (const auto& field_desc : desc.GetTopLevelFields()) {
    // filter fields to ignore here
    reading_model->AddField(field_desc.CreateField(desc));
  }
}
// we aren't going to connect the reading_model to a source or sink
// but we freeze it anyways to make sure later code does not alter
// the set of fields we will be reading
reading_model->Freeze();
// get pointers to fields from the reading_model that will stay valid
// through all of the files processed
auto ifile_ptr = reading_model->GetDefaultEntry().GetPtr<int>("ifile");
auto value = reading_model->GetDefaultEntry().GetPtr<int>("value");
for (int ifile{0}; ifile < FILEPATHS.size(); ifile++) {
  auto new_model = ROOT::RNTupleModel::Create();
  {
    // using the descriptor create a NEW model that WILL be connected
    // to a source but use the reading_model to determine which fields
    // to include and the address of the values for those fields
    ROOT::RNTupleDescriptor::RCreateModelOptions create_model_options;
    create_model_options.SetCreateBare(true);
    auto spy = ROOT::RNTupleReader::Open(create_model_options, TUPLENAME, FILEPATHS[ifile]);
    const auto& desc = spy->GetDescriptor();
    for (const auto& value : reading_model->GetDefaultEntry()) {
      std::string field_name{value.GetField().GetFieldName()};
      auto field_desc_id = desc.FindFieldId(field_name);
      if (field_desc_id == ROOT::kInvalidDescriptorId) {
        throw std::runtime_error("File does not have a field required to be read by model, file mismatch");
      }
      const auto& field_desc = desc.GetFieldDescriptor(field_desc_id);
      new_model->AddField(field_desc.CreateField(desc));
      new_model->GetDefaultEntry().BindValue<void>(field_name, value.GetPtr<void>());
    }
  }
  auto reader = ROOT::RNTupleReader::Open(std::move(new_model), TUPLENAME, FILEPATHS[ifile]);
  for (auto ientry : *reader) {
    reader->LoadEntry(ientry);
    std::cout << *ifile_ptr << " -> " << *value << std::endl;
  }
}

The full file includes writing some test files and a check that this code will throw the “File does not have a field required” exception. multi-read.cxx (3.4 KB)