The optimal way to store variable tracks count in a TTree

My example is academic indeed. A more likely case is to read a single trace over several events and I hoped that ROOT would help here. If all the traces need to be read for such an operation, there is no advantage in terms of performance compared to HDF5. It is reading almost everything that is stored on HDD to memory consecutively and format means much less here.

Your example with the cut on all traces would, of course, be also a possible case, but I understand it also needs to read all the traces to work.

That’s why the topic of this post is “The optimal way to store variable tracks count in a TTree”. I just was wondering if ROOT can give any performance benefits in the described case.

I still need to understand “How” you identify that single trace in the first place?

Each trace comes from a separate detector (and stores those 6 values vs time, time corresponding to the consecutive cells of, for example, traces[100][150].SimSignal_X). Only traces that trigger the detector’s electronics would be stored. The idea is to have a separate array for each Event, that says that trace[0] in this particular event is (for example) detector 5, trace[1] is detector 11, etc. That’s why I actually asked the question about indexing with a separate branch array in On-fly dictionary generation for vector<myclass> and use as a branch - #11 by LeWhoo

So how would I identify a single trace? It may be, that after a time we notice that a single detector gives similar values in all the events, and then want to check if it is always the case. The simplest approach is to plot this detector’s traces only over all the events. But that’s also hypothetical. I think there are plenty of situations when one would like to access only a single specific trace.

Do you imagine that there would be a strongly correlation between the index of the trace and the detector it comes from? Usually this needs to be done via:

tree->Draw("trace.somedatamember","trace.fDectector == somevalue");

because “usually” those trace can be anywhere inside the collection of trace.
If it is omre important to have this correlation, then one could do (at creation time).

std::vector<trace> traceFromDetector1
std::vector<trace> traceFromDetector2
...
tree->Branch("traceFromDetector1", "traceFromDetector1");
tree->Branch("traceFromDetector2", "traceFromDetector2");

And since the number of detector is fixed, you can even use the Branch creation from a TCollection that I mentioned to automate it (or emulate it a non TCollection).

No correlation is expected. In HDF5 format we keep a separate array telling which trace belongs to which detector. I was thinking about the same in TTree: having an int detector_indices[all_detectors_count] for each event, and then using tree->Draw("trace[detector_indices[detector_that_I_want_to_analyse]]".somedata",...); which would require reading an additional array for each event, but that should not be a huge overhead.

The problem is, that while the number of detectors is fixed, most of them will not trigger inside a single event and won’t provide any data. Thus only the traces that are triggered are stored. In principle, other traces could be filled with zeros and stored, but wouldn’t this result in a big performance and size of the file hit?

I think you might have meant:

std::vector<int> detector_indices[all_detectors_count];

but I see what you mean and it is similar (but of course with different trade off) to the fDectector data member I was talking about.

In principle, other traces could be filled with zeros and stored, but wouldn’t this result in a big performance and size of the file hit?

In the sketch I wrote, it would cost in storage a single int (set to 0) per detector per entries.

The choice between a single trace collection and a trace collection per detector will depend greatly on what the analyst do more. If they most often use trace coming from all detectors, then the single trace collection will be faster and easier to use. If they most often use trace coming for a single (or very few select) detectors then a "a trace collection per detector " will be faster (and possibly easier to use, maybe).

Still, this approach was for a variable number of traces, which requires reading all traces in TTree, thus I guess out of the question unless there is some way to implement this ~2D variability in a more optimal way.

Well, I wanted to try the all-detectors-in-the-experiment approach from the stard and I’ve started coding it. While the size impact shouldn’t be big indeed, I wonder if having, for example, 200,000 traces per event, most of them empty, will have impact on performance.

That would indeed be very bad (waste of space, waste (de)compression time). You really need to avoid using fixed size array :slight_smile: if that is what you meant. (In the example with vector and Trace contains vector, you would not store any empty trace.

thus I guess out of the question unless there is some way to implement this ~2D variability in a more optimal way.

I am vaguely (still) confused on which direction you are hoping to optimize. In general term, the nested collection case is improved in RNtuple and given the example you gave so far might support enough cases for you. I recommend that you give it a try.

The optimisation is a big question for the prototype phase of the experiment. There may be anything to debug, so fast access to any “crosssection” through the data may prove important.

About RNtuple, I would gladly try it, but there are two issues:

  1. I understand it is in the experimental phase. We need something stable, that will be easily reusable in the future. I doubt I would be able to convince to anyone to use an experimental ROOT feature, especially given the infamous history of the ROOT 5 to 6 migration.
  2. We are implementing our software in python. After quick googling I could not find how to use RNTuple in pyROOT, I am not sure if it is already supported.

The multi-branch approach where I generate a separate branch for each (possible) trace with TObjArray may be a no go due to a simple reason. I am trying with 200,000 branches. Adding them to TObjArray was fast. However was no able to wait for execution of TTree::Branch(array) on such TObjArray… For 40000 branches the execution took ~370 seconds, for 10,000 branches ~10 seconds, and for 20000 branches ~70 s, so it seems non-linear in time. It is not specific for TObjArray, when I create all the branches separately it takes the same amount of time.

So unless it is a bug or some mistake of mine, I think it can be (sadly) concluded that TTrees are not preferable for storing either variable amount of variable length arrays (kind of 2D variability) nor very big amount of branches. In the former case they just do not offer any performance advantage compared to formats as HDF5. In the latter case I did not try to compare with anything.

Still, big thanks for your help!

Well, you could design your own class with a manually written ROOT I/O “streamer”.
Something like a static 2D array with fixed maximal dimensions but the “streamer” would only write / read parts of it (of course, additional variables in the class would need to keep the actual variable dimensions on an event by event basis).

I never did it or read about it. Before I dive into docs, could you tell me, what could be gained? Would it be easy to obtain 2D variability, where not the whole 2D array needs to be read, for example?

In addition, I can report that on a tree with 10,000 branches, the first action, in this case:

t.Draw("a5.v5[0]:Entry$", "a5.track_length>0", "goff")

takes ~100 seconds. Second, very similar:
t.Draw("a50.v1[0]:Entry$", "a50.track_length>0", "goff")
takes just ~0.2 s, so the first time is just some kind of init. The above results is from PyROOT, but when I do the draw from the root shell it is similar.

I am not sure if this is the expected behaviour, but it seems I am going through uncharted waters :slight_smile:

All ROOT C++ interface (and even yours :slight_smile: ) are automatically supported in PyROOT thanks to cppyy and Cling. What might be missing/not-yet-implemented are pythonization of the interface.

That depends on the timeframe at which the experiment is going to start acquiring “precious” (that can not be regenerate) data. If this is a couple of years away then this *is* the perfect time to test RNTuple and help build it into a tool useful for your experiment as there is still some time to influence the design and feature set. And as far as long term backward compatibility we are getting pretty close to the point where the file format will be set.

Well … indeed that is too large :slight_smile: … It also never makes sense to store individual trace in a branch. What could make sense is to store them through some “useful” category (whether it is their detector or some other inherent property).

I see. Sadly or luckily, such data will come soon. If not for covid, I would say this year. Thus I guess RNTuple is too “adventurous” for us at the moment. Maybe in the future, it will be worth converting to it.

The point is, that each trace comes from a different detector - a different antenna. 200,000 altogether may never be the case, but 10,000 antennas, each one with its own triggering system will have a possibility to contribute to a single event. There are no obvious categories to separate antennas into, I am afraid… And 10,000 branches are already causing problems.

However, perhaps it would be possible to go with a constant trace length. So the problem would be reduced to 1D variability. Still, I am not sure if it is well supported (in terms of performance) by TTree, as here the variability is on the highest level (variable number of traces), while the usual case, I think, is with variability on the lowest level (in our case: trace length). So to handle the variability I still would have to have vector, the only difference would be that Trace would have a constant length. Would TTree split such a vector in a way, that only one trace at a time per event could be read?

Just to confirm: is there a good way to store a variable number of constant length traces in a TTree, where the most likely readout scenario is to read a single, whole trace? This is 1D variability, not 2D variability as discussed before. However, I understand the answer is still “no”, as variability needs to be inside a single branch, and a single branch always needs to be read as a whole…

Hello again,

I am constructing the final benchmarking plots for the discussed issue, and one thing started to puzzle me. I thought that ROOT has to read the whole branch, and thus the whole 2D array, for a specific entry. Thus reading a single value from the array across all entries would be roughly equal to reading the whole array across all the entries, if I/O is the dominant factor. I am not sure that is the case.

entries = t.Draw("traces[100].SimSignal_X", "", "goff")
takes roughly 0.93 seconds (there are 1001 entries, and the traces.SimSignal_X array, while changing is roughly 180x1000 (180 traces, each 1000 long)) and returns 1005156 entries (~1000 entries * 1000 length of a trace[100]).

entries = t.Draw("traces[100].SimSignal_X", "Entry$==499", "goff")
takes roughly 0.86 s and returns 1006 entries (the length of the trace[100] for Entry$==499.

The times are roughly equal. If reading a single value from traces.SimSignal_X required reading all values of traces.SimSignal_X (all traces, whole length, roughly 180*1000), then The first case should be roughly ~1000 times longer than the second case. In the second case we read traces.SimSignal_X just for a single entry, in the first case for all ~1000 entries. But this is not the case.

The times are also not proportional to the number of entries: 1005156 in the first case, 1006 in the second.

Reading all traces instead of traces[100] for a single entry almost does not affect time (so I guess all traces are read always, as expected), however reading all traces through all entries:
entries = t.Draw("traces.SimSignal_X", "", "goff")
takes ~40 seconds, ~50 times longer then just traces[100].

There is no compression in this file, so I don’t know what is happening.

I would expect reading traces.SimSignal_X through all Entries to take the same amount of time as reading traces[100].SimSignal_X, because all the traces are always read (or so I thought). Simultaneously, I would expect a single entry to be read ~1000 times faster than all the entries. Both cases are false. Could you suggest what is happening?