The optimal way to store variable tracks count in a TTree

ok. So indeed the 2nd case reads “182/2”(ish?) more data than the 1st case (but plot the same subset).

So is there a way to construct a TTree following this kind of 2D variability, that will not have to read whole arrays and thus be faster? I was thinking about TClonesArray, because according to the documentations it can split its members.

You are mis-reading the word “split” in this context. The data is already split and the data members are in their own branches: traces.SimEfield_X, traces.SimEfield_Y , etc. and a TClonesArray will be only as good.

What you are describing is to distribute the elements of the collections across many branches. This is supported for ROOT collections other than TClonesArray (for example TObjArray or TList) *but* this is unlikely to help you since the number branches (and thus elements) is *fixed* at branch creation time (I.e. "because the number of tracks varries from event to event. ").


So another essential question is how important is to the physics use case to be able to read fast a few hard-coded-by-index traces. The example you gave:

t.Draw("Traces_A100_SimSignal_X[150]*Traces_A100_SimSignal_Y[150]*Traces_A100_SimSignal_Z[150]*Traces_A10_SimSignal_X[150]*Traces_A50_SimSignal_Z[150]", "", "goff")

seems “academic”. What is special about trace number 100 and 50 (and signal 150), especially since in some entries those would not exist (unless the variation in number of trace is extremely small)?
[I have seen those kind of expression for trigger bits where the number is fixed of elements is fixed and their index semantically important]

Isn’t a more realistic use case:

t.Draw("traces[][].SimSignal_X*traces[][].SimSignal_Y*traces[][].SimSignal_Z*traces[][].SimSignal_X*traces[][].SimSignal_Z",  "0.1 < traces[][].SimSignal_X && traces[][].SimSignal_X < 0.2", "goff")

i.e. plot some information about all the traces that meet a given criteria (Note that doing this with the “fast” solution is actually ‘hard’, you have to spell out all the trace branch names).

My example is academic indeed. A more likely case is to read a single trace over several events and I hoped that ROOT would help here. If all the traces need to be read for such an operation, there is no advantage in terms of performance compared to HDF5. It is reading almost everything that is stored on HDD to memory consecutively and format means much less here.

Your example with the cut on all traces would, of course, be also a possible case, but I understand it also needs to read all the traces to work.

That’s why the topic of this post is “The optimal way to store variable tracks count in a TTree”. I just was wondering if ROOT can give any performance benefits in the described case.

I still need to understand “How” you identify that single trace in the first place?

Each trace comes from a separate detector (and stores those 6 values vs time, time corresponding to the consecutive cells of, for example, traces[100][150].SimSignal_X). Only traces that trigger the detector’s electronics would be stored. The idea is to have a separate array for each Event, that says that trace[0] in this particular event is (for example) detector 5, trace[1] is detector 11, etc. That’s why I actually asked the question about indexing with a separate branch array in On-fly dictionary generation for vector<myclass> and use as a branch - #11 by LeWhoo

So how would I identify a single trace? It may be, that after a time we notice that a single detector gives similar values in all the events, and then want to check if it is always the case. The simplest approach is to plot this detector’s traces only over all the events. But that’s also hypothetical. I think there are plenty of situations when one would like to access only a single specific trace.

Do you imagine that there would be a strongly correlation between the index of the trace and the detector it comes from? Usually this needs to be done via:

tree->Draw("trace.somedatamember","trace.fDectector == somevalue");

because “usually” those trace can be anywhere inside the collection of trace.
If it is omre important to have this correlation, then one could do (at creation time).

std::vector<trace> traceFromDetector1
std::vector<trace> traceFromDetector2
...
tree->Branch("traceFromDetector1", "traceFromDetector1");
tree->Branch("traceFromDetector2", "traceFromDetector2");

And since the number of detector is fixed, you can even use the Branch creation from a TCollection that I mentioned to automate it (or emulate it a non TCollection).

No correlation is expected. In HDF5 format we keep a separate array telling which trace belongs to which detector. I was thinking about the same in TTree: having an int detector_indices[all_detectors_count] for each event, and then using tree->Draw("trace[detector_indices[detector_that_I_want_to_analyse]]".somedata",...); which would require reading an additional array for each event, but that should not be a huge overhead.

The problem is, that while the number of detectors is fixed, most of them will not trigger inside a single event and won’t provide any data. Thus only the traces that are triggered are stored. In principle, other traces could be filled with zeros and stored, but wouldn’t this result in a big performance and size of the file hit?

I think you might have meant:

std::vector<int> detector_indices[all_detectors_count];

but I see what you mean and it is similar (but of course with different trade off) to the fDectector data member I was talking about.

In principle, other traces could be filled with zeros and stored, but wouldn’t this result in a big performance and size of the file hit?

In the sketch I wrote, it would cost in storage a single int (set to 0) per detector per entries.

The choice between a single trace collection and a trace collection per detector will depend greatly on what the analyst do more. If they most often use trace coming from all detectors, then the single trace collection will be faster and easier to use. If they most often use trace coming for a single (or very few select) detectors then a "a trace collection per detector " will be faster (and possibly easier to use, maybe).

Still, this approach was for a variable number of traces, which requires reading all traces in TTree, thus I guess out of the question unless there is some way to implement this ~2D variability in a more optimal way.

Well, I wanted to try the all-detectors-in-the-experiment approach from the stard and I’ve started coding it. While the size impact shouldn’t be big indeed, I wonder if having, for example, 200,000 traces per event, most of them empty, will have impact on performance.

That would indeed be very bad (waste of space, waste (de)compression time). You really need to avoid using fixed size array :slight_smile: if that is what you meant. (In the example with vector and Trace contains vector, you would not store any empty trace.

thus I guess out of the question unless there is some way to implement this ~2D variability in a more optimal way.

I am vaguely (still) confused on which direction you are hoping to optimize. In general term, the nested collection case is improved in RNtuple and given the example you gave so far might support enough cases for you. I recommend that you give it a try.

The optimisation is a big question for the prototype phase of the experiment. There may be anything to debug, so fast access to any “crosssection” through the data may prove important.

About RNtuple, I would gladly try it, but there are two issues:

  1. I understand it is in the experimental phase. We need something stable, that will be easily reusable in the future. I doubt I would be able to convince to anyone to use an experimental ROOT feature, especially given the infamous history of the ROOT 5 to 6 migration.
  2. We are implementing our software in python. After quick googling I could not find how to use RNTuple in pyROOT, I am not sure if it is already supported.

The multi-branch approach where I generate a separate branch for each (possible) trace with TObjArray may be a no go due to a simple reason. I am trying with 200,000 branches. Adding them to TObjArray was fast. However was no able to wait for execution of TTree::Branch(array) on such TObjArray… For 40000 branches the execution took ~370 seconds, for 10,000 branches ~10 seconds, and for 20000 branches ~70 s, so it seems non-linear in time. It is not specific for TObjArray, when I create all the branches separately it takes the same amount of time.

So unless it is a bug or some mistake of mine, I think it can be (sadly) concluded that TTrees are not preferable for storing either variable amount of variable length arrays (kind of 2D variability) nor very big amount of branches. In the former case they just do not offer any performance advantage compared to formats as HDF5. In the latter case I did not try to compare with anything.

Still, big thanks for your help!

Well, you could design your own class with a manually written ROOT I/O “streamer”.
Something like a static 2D array with fixed maximal dimensions but the “streamer” would only write / read parts of it (of course, additional variables in the class would need to keep the actual variable dimensions on an event by event basis).

I never did it or read about it. Before I dive into docs, could you tell me, what could be gained? Would it be easy to obtain 2D variability, where not the whole 2D array needs to be read, for example?

In addition, I can report that on a tree with 10,000 branches, the first action, in this case:

t.Draw("a5.v5[0]:Entry$", "a5.track_length>0", "goff")

takes ~100 seconds. Second, very similar:
t.Draw("a50.v1[0]:Entry$", "a50.track_length>0", "goff")
takes just ~0.2 s, so the first time is just some kind of init. The above results is from PyROOT, but when I do the draw from the root shell it is similar.

I am not sure if this is the expected behaviour, but it seems I am going through uncharted waters :slight_smile:

All ROOT C++ interface (and even yours :slight_smile: ) are automatically supported in PyROOT thanks to cppyy and Cling. What might be missing/not-yet-implemented are pythonization of the interface.

That depends on the timeframe at which the experiment is going to start acquiring “precious” (that can not be regenerate) data. If this is a couple of years away then this *is* the perfect time to test RNTuple and help build it into a tool useful for your experiment as there is still some time to influence the design and feature set. And as far as long term backward compatibility we are getting pretty close to the point where the file format will be set.

Well … indeed that is too large :slight_smile: … It also never makes sense to store individual trace in a branch. What could make sense is to store them through some “useful” category (whether it is their detector or some other inherent property).