Hello,
It’s been pointed here The optimal way to store variable tracks count in a TTree - #28 by LeWhoo that RNTuple may be much more efficient in reading a variable number of vectors than a TTree.
Just to remind: each of my events has a different number of traces. Each trace is composed of 6 vector (also variable length). To store this variability in TTree I’ve created a branch holding vector<my_trace_class>, and the TTree was able to split it into 6 vector<vector>. However, in this way the TTree reads a whole vector<vector> per entry - basically vector X of all the traces. The likely user case is to read vector X of only one trace, or just a single value of this vector, through all the events (entries). With roughly 180 traces per event, with TTree I am reading ~180 times too much.
The question is, how to do it properly with RNTuple? At the moment I just followed what I’ve created for the TTree:
std::shared_ptr<vector<vector<float>>> se_x = model->MakeField<vector<vector<float>>>("SimEfield_X");
std::shared_ptr<vector<vector<float>>> se_y = model->MakeField<vector<vector<float>>>("SimEfield_Y");
std::shared_ptr<vector<vector<float>>> se_z = model->MakeField<vector<vector<float>>>("SimEfield_Z");
std::shared_ptr<vector<vector<float>>> ss_x = model->MakeField<vector<vector<float>>>("SimSignal_X");
std::shared_ptr<vector<vector<float>>> ss_y = model->MakeField<vector<vector<float>>>("SimSignal_Y");
std::shared_ptr<vector<vector<float>>> ss_z = model->MakeField<vector<vector<float>>>("SimSignal_Z");
Then for reading I do:
auto se_x = model->MakeField<vector<vector<float>>>("SimEfield_X");
// auto se_y = model->MakeField<vector<vector<float>>>("SimEfield_Y");
// auto se_z = model->MakeField<vector<vector<float>>>("SimEfield_Z");
// auto ss_x = model->MakeField<vector<vector<float>>>("SimSignal_X");
// auto ss_y = model->MakeField<vector<vector<float>>>("SimSignal_Y");
// auto ss_z = model->MakeField<vector<vector<float>>>("SimSignal_Z");
auto ntuple = RNTupleReader::Open(std::move(model), "F", "test.root");
auto stime = std::chrono::high_resolution_clock::now();
for (auto entryId : *ntuple) {
ntuple->LoadEntry(entryId);
}
auto etime = std::chrono::high_resolution_clock::now();
I can see differences in the measured readout time when I comment/uncomment more of the fields. However, I understand, that in this case the RNTuple also reads the whole vector<vector>. How to make it read just a single vector from inside the external vector? Or perhaps a completely different approach to storing the variable number of traces should be used in the case of RNTuple?
ROOT Version: 6.24.00