Hi everyone,
I am working around low-level internals of the new C++ API and am looking for some advice on memory management.
- ROOT Version: 6.36.00
- Platform: Ubuntu 24.04 (Docker)
- Compiler: GCC / C++17
I have been building a C++ prototype to stream RNTuple columnar data natively as Apache Arrow tables over an Arrow Flight (gRPC) server. The idea is to stream data to Python/remote clients without requiring local file downloads.
The C++ streaming is working decent for initial state (I measured < 1.85x overhead compared to a raw RNTuple loop), but I want to optimize the memory handoff.
My Question:
Currently, my overhead comes from doing a manual memcpy of the values from RNTuple into Arrow’s pre-allocated memory buffers.
Does the RNTupleReader API expose a safe way to get a raw pointer to the underlying uncompressed memory page for a specific column?
I would prefer to “alias” or “borrow” that memory directly into Apache Arrow to achieve true zero-copy, but I am not sure if that memory is strictly hidden behind the REntry / model layer.
Any tips, or pointing me to the right class in the source code, would be hugely appreciated. Thank you!
Reference:
Github - KaranSinghDev/RNTuple-Arrow-Gateway