Ignore fields when reading an RNTuple

tomeichlersmith · June 24, 2026, 4:19pm

I am trying to ignore (i.e. don’t bother loading the data into memory) specific fields of an RNTuple when reading a file. I can do this if I already know the type of all the fields I want by creating my own RNTupleModel and giving that model to the RNTupleReader, but I’m wondering if there is a way to ignore fields without needing to specify the types of all of the other fields.

Context: I am working on updaing the LDMX-Software/ldmx-sw Framework to use RNTuple instead of TTree (mock-up framework on Codeberg) and one feature that I would like to carry forward is ignoring branches/fields to (hopefully) improve reading performance without needing to create a separate copy/skim of the data. To be honest, I’m not even certain that the current TTree Framework correctly implements this “ignoring”, but I’d like to try to mimic the idea if possible with RNTuple.

ROOT Version: 6.38.00
Built for linuxx8664gcc on Nov 27 2025, 09:05:19
From tags/v6-38-00@v6-38-00

I’ve uploaded the source code I’m using to try to mock-up this “ignoring” feature separate from the rest of the framework. It is a copy of src/ignore.cxx in the mock-up framework linked above. ignore.cxx (3.3 KB)

In that code, I test various different reading methods and then printout the number of bytes read (szReadPayload + szReadOverhead) to check how much data was actually “looked at” by the reader. I test reading a single 10-entry RNTuple with two fields: "to_keep" and "to_ignore". In the discussion below, I am omitting all of the instrumentation code for brevity.

naive

The naive reading method is to only GetPtr for the fields I care about. In this example, that is the field "to_keep". This is effectively what is done in the RNTuple skim example.

auto reader = ROOT::RNTupleReader::Open(TUPLENAME, FILEPATH);
auto keep = reader->GetModel().GetDefaultEntry().GetPtr<int>("to_keep");
for (auto i: *reader) {
  reader->LoadEntry(i);
}

This reads 96B which I guess is the amount for both “to_keep” and “to_ignore” which I deduce from below.

own model

If I construct my own model and provide that model to the RNTupleReader, then I can half the number of bytes read.

auto model = ROOT::RNTupleModel::Create();
auto to_keep = model->MakeField<int>("to_keep");
reader = ROOT::RNTupleReader::Open(std::move(model), TUPLENAME, FILEPATH);
for (auto i : *reader) {
  reader->LoadEntry(i);
}

This reports reading only 48B which makes me think it is effectively ignoring "to_ignore" while reading.

view

For what its worth, I can also have the reader only read 48B by using GetView.

auto view = reader->GetView<int>("to_keep");
for (auto i : reader->GetEntryRange()) {
  view(i);
}

I’m reluctant to use GetView in the framework since I forsee integrating it to be difficult. I would probably drop this “ignore” feature instead of trying to integrate GetView to be honest.

skimmed model

Now the actual issue. I would like to be able to read "to_keep" and only "to_keep" without needing to specify the type of "to_keep". My first attempt to do this was just to try to create a model after opening the file and then re-opening the RNTupleReader with the updated model. This attempt fails with the error message fixed column representative only valid when connecting to a page sink.

auto reader = ROOT::RNTupleReader::Open(TUPLENAME, FILEPATH);  
auto model = ROOT::RNTupleModel::Create();
for (auto& value : reader->GetModel().GetDefaultEntry()) {
  if (value.GetField().GetFieldName().find("ignore") != std::string::npos) {
    continue;
  }
  const auto& name = value.GetField().GetFieldName();
  model->AddField(value.GetField().Clone(name));
  model->GetDefaultEntry().BindValue<void>(name, value.GetPtr<void>());
}

reader = ROOT::RNTupleReader::Open(std::move(model), TUPLENAME, FILEPATH);
auto keep = reader->GetModel().GetDefaultEntry().GetPtr<int>("to_keep");
reader->EnableMetrics();
for (auto i: *reader) {
  reader->LoadEntry(i);
}

Is there a way to construct this “subset” RNTupleModel from the “full on-disk” model without knowing the types? Or alternatively, can I LoadEntry on only certain fields of the RNTuple like the view does?

tomeichlersmith · June 24, 2026, 4:23pm

I’ve also tried to write my own LoadEntry that only calls Read on the values that I’m keeping (i.e. "to_keep" and not "to_ignore").

  // replacement for LoadEntry in naive reading
  for (auto& value : reader->GetModel().GetDefaultEntry()) {
    if (value.GetField().GetFieldName().find("ignore") != std::string::npos) {
      continue;
    }
    value.Read(i);
  }

However, this does not compile since the reader’s model’s entry only provides const access to the values and I need non-const access in order to be able to Read.

jblomer · June 24, 2026, 8:57pm

There is unfortunately no direct API to ignore fields. As you write, you can go either via views or via a reduced model (the custom LoadEntry() is not supported, as indicated by the constness, and accessing only certain fields from a model has no impact on what is loaded, as you point out).

The error message when you try to open with the reduced model is not ideal. The reason is that fields were cloned from a model already connected to a page source, which taints them for getting connected to another page source. You can instead do the following

auto fullModel = reader->GetDescriptor().CreateModel();
auto reducedModel = ROOT::RNTupleModel::Create();
for (const auto& value : fullModel->GetDefaultEntry()) {
  // ...
}

or, slightly more efficient because you don’t need to create fields twice:

const auto &desc = reader->GetDescriptor();
auto model = ROOT::RNTupleModel::Create();
for (const auto &fieldDesc : desc.GetTopLevelFields()) {
   if (fieldDesc.GetFieldName() == "keep")
      model->AddField(fieldDesc.CreateField(desc));
}

Regarding views, can you tell what would make it difficult to integrate? Perhaps we can work on that.

tomeichlersmith · June 25, 2026, 2:39pm

Excellent! Thank you - using your code to create a “skimmed model” does work and reports only 48B read. The updated code I used is ignore.cxx (4.2 KB)

regarding views

One of the main goals of our framework is to have “processors” that do specific tasks with some input data but do not care about the source of that input data. Specifically, we would like these processors to be able to run after reading the data from a file (via RNTupleReader) or after some other “source” produced their required inputs (e.g. a simulation creating sim hits rather than reading those sim hits from a previously-generated file or a decoder unpacking raw data into structured data) without any change to the processor’s code.

The GetView breaks this symmetry for two reasons:

Needing access to a RNTupleReader in order to call GetView in the first place. What would I do to support the no-reader processing situation?
Even if we can get around that hurdle, needing to know the entry index to supply to the RNTupleView::operator() in order to trigger the read. What should the entry index be if there is no reader?

We avoid these intricacies by having processor’s get their inputs using model.GetDefaultEntry().GetPtr and then holding that address for the entire execution of the program. The data source (reader or otherwise) updates the data at the address location as entries are processed and the processor can then read from that shared address.

I was previously unaware of the RNTupleDescriptor which is exactly what I want in this use case (look at the file to see available fields, construct a model from the available fields dropping some that we want to ignore). I don’t think view needs to change since it is well suited to an “analysis” workflow where a few specific columns are needed and the user can avoid all this descriptor/model/field juggling. I’m venturing to update a more intricate framework where I’m willing to take on the extra descriptor/model/field juggling.

Edit: Feel free to poke around the mock framework (same link as in original post) if you’re curious. Feedback is welcome especially if I’m doing something dangerous/wrong/stupid with ROOT RNTuple{Reader, Writer, Model}.

jblomer · June 25, 2026, 9:06pm

We avoid these intricacies by having processor’s get their inputs using model.GetDefaultEntry().GetPtr and then holding that address for the entire execution of the program. The data source (reader or otherwise) updates the data at the address location as entries are processed and the processor can then read from that shared address.

I see. I’m not necessarily advocating for using views, but you can have the same pattern. A vector of views that is updated by the data source. The stable pointer should be retrievable through view.GetValue().GetPtr (or you can bind your own pointer through the corresponding RNTupleReader::GetView() overload).