The optimal way to store variable tracks count in a TTree

Hello,

As I have mentioned in this post: Array of strings as a leaf in a branch of TTree
I am trying to find out if we can replace our HDF5 structure with ROOT and gain significant speed.

The main issue is that in our events we have a variable number of tracks. Each track consists of x, y and z arrays (and corresponding 3 arrays for something else), whose length varies between tracks (although is the same inside a single track).

I am benchmarking on a TTree with 1000 events. Initially, I’ve created 175 tracks for each event, each track 999 long, and each track was a separate branch (which is possible only in the constant tracks count case). Now I moved to the real case with variabilities. I’ve created a class for the track:

class track
	{
	public:
	    vector<float> X;
	    vector<float> Y;
	    vector<float> Z;
	    vector<float> aX;
	    vector<float> aY;
	    vector<float> aZ;
	};

then vector<track>, for which I create a branch in my TTree. I vary the number of tracks and the track length slightly for each event. The results of Drawing some values from this tree is ~5 times slower than from the tree with a constant number of tracks and separate branches. Average track length, etc. may be slightly larger in the variable case, but not to explain the 5 times difference. So why is it slower?

  1. It can’t be avoided due to variability
  2. The dictionary for the class is not properly generated during the write-out (as I do it in Python and there is perhaps some bug, see: On-fly dictionary generation for vector<myclass> and use as a branch - #6 by pcanal) and it can have an impact on performance
  3. This vector of my classes containing vectors is not the most optimal implementation

In case 1 nothing can be done. It still seems to be ~5 times faster than HDF5, but the difference is not spectacular anymore.

In case 2, I would like some confirmation that this could be the real case. Then I would need to know how to properly generate a dictionary for a vector of my classes in PyROOT…

In case 3, I would be grateful for advice. The TTrees would be read out using python, actually preferably using uproot.

I am also considering a constant number of tracks with most of them 0, but with the target of 200,000 of them, I am afraid that there would be a big hit on size and performance. And, anyway, they would have to be in some sort of an array, for browsing through 200,000 branches is rather unconvenient.

I would appreciate any comments and advice.


ROOT Version: 6.22.06
Platform: Fedora 33
Compiler: Not Provided


I guess, the best performance you will get with something like this:

#define _MAX_TRACKS_ 1000 /* whatever you need */
class track {
public:
  track() : nTracks(0) {}; /* for the ROOT I/O */
  Int_t nTracks;
  Float_t X[(_MAX_TRACKS_)]; //[nTracks]
  Float_t Y[(_MAX_TRACKS_)]; //[nTracks]
  Float_t Z[(_MAX_TRACKS_)]; //[nTracks]
  Float_t aX[(_MAX_TRACKS_)]; //[nTracks]
  Float_t aY[(_MAX_TRACKS_)]; //[nTracks]
  Float_t aZ[(_MAX_TRACKS_)]; //[nTracks]
};

Update … Warning: The above source code does NOT really work as “desired”. See posts below.

And then I would store all the tracks for the event in a vector? I am afraid I do not fully understand the code above. I guess the length of X/Y/Z… is variable and, for each track, set to nTracks. How does track() : nTracks(0) {} interact with the ROOT I/O? Why do I need MAX_TRACKS, can’t it be just nTracks? And, does ROOT parse the comment and know that it should replace (MAX_TRACKS) with nTracks?

Thanks!

When you fill the tree, you need to set the “nTracks” for each entry.

Search for “variable length” in: ROOT User’s Guide
Search for “variable size” in: TTree

Sorry, it seems I overestimated ROOT’s cleverness.

It seems that ROOT will not recognize “//[nTracks]” in this case. Maybe @pcanal could give a hand.

It seems to me that if one wants to stay away from any “pointers” and “new operators”, one would need to manually create separate branches for each class data member, as in:
${ROOTSYS}/tutorials/tree/tree2.C
${ROOTSYS}/tutorials/tree/tree3.C

Otherwise, on the cost of performance, you could try to follow:
${ROOTSYS}/tutorials/tree/tree2a.C
${ROOTSYS}/tutorials/tree/tree4.C

In any case, (for performance reasons) stay away from any “std::vector<...>”, if you can.

In principle, I could use pointers. The only problem that I’ve seen was that uproot3 had trouble recognising a pointer to a vector, but I guess this is not a discussion for here.

However, ROOT examples deal with variability in 1D. In my case, I need sort-of 2D variability: variable number of tracks and a variable number of elements in each array of a track. Thus the approach with a separate member for each member of the track would anyway involve creating a class for each member and then the array/vector of these classes to get the “2D” variability. Am I correct?

In addition, in the examples, a simple class inherits from TObject. Is there any advantage of this inheritance in this case?

The ‘right’ syntax to use C style arrays is:

You can allocate the array to be large enough for your max size to minimize the number of reallocation.

What was the resulting TTree’s Print result (and what was it for the ‘fast’ case)? There is no good reason for the 2 cases you mentioned to be significantly different in performance unless the split level is different.

@pcanal Yes, I know that it works with “pointers” but then it also needs “new statements”. I originally hoped that “//[nTracks]” will be also automatically correctly treated in my example, but it seems that it will only work if one manually creates separate branches for each class data member.

Indeed. But you can get away with a single “new statement” per data member per process.

Well, I was trying to deal with a simple 1D case (which ROOT “natively” supports) … I guess the problem description is changed now … “I need sort-of 2D variability: variable number of tracks and a variable number of elements in each array of a track.”

The split level may be different indeed “by default” as I was not modifying it. In the faster version, all 6 arrays of each track (here called trace) had separate branches. So for 175 tracks I had 6*175 separate branches, all of them constant length. An extempt from the TTree::Print() for one track:


*............................................................................*
*Br 1025 :Traces_A95_SimEfield_X : Traces_A95_SimEfield_X[999]/F             *
*Entries :     1001 : Total  Size=    4008555 bytes  File Size  =    3084192 *
*Baskets :       73 : Basket Size=      56320 bytes  Compression=   1.30     *
*............................................................................*
*Br 1026 :Traces_A95_SimEfield_Y : Traces_A95_SimEfield_Y[999]/F             *
*Entries :     1001 : Total  Size=    4008555 bytes  File Size  =    3043325 *
*Baskets :       73 : Basket Size=      56320 bytes  Compression=   1.32     *
*............................................................................*
*Br 1027 :Traces_A95_SimEfield_Z : Traces_A95_SimEfield_Z[999]/F             *
*Entries :     1001 : Total  Size=    4008555 bytes  File Size  =    3075854 *
*Baskets :       73 : Basket Size=      56320 bytes  Compression=   1.30     *
*............................................................................*
*Br 1028 :Traces_A95_SimSignal_X : Traces_A95_SimSignal_X[998]/F             *
*Entries :     1001 : Total  Size=    4004551 bytes  File Size  =    3807684 *
*Baskets :       73 : Basket Size=      56320 bytes  Compression=   1.05     *
*............................................................................*
*Br 1029 :Traces_A95_SimSignal_Y : Traces_A95_SimSignal_Y[998]/F             *
*Entries :     1001 : Total  Size=    4004551 bytes  File Size  =    3796853 *
*Baskets :       73 : Basket Size=      56320 bytes  Compression=   1.05     *
*............................................................................*
*Br 1030 :Traces_A95_SimSignal_Z : Traces_A95_SimSignal_Z[998]/F             *
*Entries :     1001 : Total  Size=    4004551 bytes  File Size  =    3782623 *
*Baskets :       73 : Basket Size=      56320 bytes  Compression=   1.06     *
*............................................................................*

For the slower version, as you could see above, all 6 arrays were changed to variable-length vectors, aggregated into a single class, and a vector was created for this class. I’ve added also an extra array of ints. That’s how it looks after Print():

*............................................................................*
*Br   95 :traces    : Int_t traces_                                          *
*Entries :     1001 : Total  Size=      97909 bytes  File Size  =      20020 *
*Baskets :      143 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br   96 :traces.SimEfield_X : vector<float> SimEfield_X[traces_]            *
*Entries :     1001 : Total  Size=  726643697 bytes  File Size  =  726631758 *
*Baskets :      575 : Basket Size=    1783296 bytes  Compression=   1.00     *
*............................................................................*
*Br   97 :traces.SimEfield_Y : vector<float> SimEfield_Y[traces_]            *
*Entries :     1001 : Total  Size=  726643697 bytes  File Size  =  726631758 *
*Baskets :      575 : Basket Size=    1783296 bytes  Compression=   1.00     *
*............................................................................*
*Br   98 :traces.SimEfield_Z : vector<float> SimEfield_Z[traces_]            *
*Entries :     1001 : Total  Size=  726643697 bytes  File Size  =  726631758 *
*Baskets :      575 : Basket Size=    1783296 bytes  Compression=   1.00     *
*............................................................................*
*Br   99 :traces.SimSignal_X : vector<float> SimSignal_X[traces_]            *
*Entries :     1001 : Total  Size=  726642993 bytes  File Size  =  726631054 *
*Baskets :      575 : Basket Size=    1783296 bytes  Compression=   1.00     *
*............................................................................*
*Br  100 :traces.SimSignal_Y : vector<float> SimSignal_Y[traces_]            *
*Entries :     1001 : Total  Size=  726642993 bytes  File Size  =  726631054 *
*Baskets :      575 : Basket Size=    1783296 bytes  Compression=   1.00     *
*............................................................................*
*Br  101 :traces.SimSignal_Z : vector<float> SimSignal_Z[traces_]            *
*Entries :     1001 : Total  Size=  726642993 bytes  File Size  =  726631054 *
*Baskets :      575 : Basket Size=    1783296 bytes  Compression=   1.00     *
*............................................................................*
*Br  102 :antenna_indices : antenna_indices[300]/I                           *
*Entries :     1001 : Total  Size=    1231480 bytes  File Size  =    1225425 *
*Baskets :      285 : Basket Size=       6656 bytes  Compression=   1.00     *
*............................................................................*

Of course, there are other branches in those trees, but they were the same in both implementations.

I see that the splittting is essentially the same BUT I also see:

*............................................................................*
*Br 1025 :Traces_A95_SimEfield_X : Traces_A95_SimEfield_X[999]/F             *
*Entries :     1001 : Total  Size=    4008555 bytes  File Size  =    3084192 *
*Baskets :       73 : Basket Size=      56320 bytes  Compression=   1.30     *

vs

*Br   96 :traces.SimEfield_X : vector<float> SimEfield_X[traces_]            *
*Entries :     1001 : Total  Size=  726643697 bytes  File Size  =  726631758 *
*Baskets :      575 : Basket Size=    1783296 bytes  Compression=   1.00     *

with 2 large “surprises”. One is that the compression ration is exactly 1.00 in the 2nd alternative and
the other is that the total size (in both case this is before compression) is 726,643,697 vs 4,008,555.
This means that the 2nd alternative stored 181 times more data!!! With those numbers it is actually surprising the difference is only a factor 5 :).

The most likely cause is that when using the vector they are being reused from entries to entries (which is a good thing) but never “cleared” (meaning that the last entry actually would be containing the data from all the previous entries too - alternatively if compression was not explicitly disabled then the 1.0 compression factor would indicates that the data is actually completely random).

The compression is my fault. The “fast” case were much earlier tests, where I was just following the default compression. For the “slower” file I decided to switch off the compression, mainly to be sure that I am not storing any “rubbish”. And it suggests I am not:

I’ve noticed the strange size differences while pasting. However, the “fast” file, which includes compression, is 3.4 GB, the “slow” one is 4.2 and roughly corresponds to 1001 events with ~175 tracks, each with 6 float arrays with ~1000 cells. So, I see what TTree::Print() says, but ls -l says something different :slight_smile:

Well, actually the size probably shows, that the split level is not the same (or is, depending, on definition). 726,643,697/4,008,555~=182. That seems to correspond to the number of tracks. In the “fast” case they are in separate branches, in the “slow” case they are in a single vector (for each of the 6 arrays) and probably all ~182 are read at once, and thus the slowdown. The question is, is there a way to separate them?

Fair enough. In the “fast” case, there seems to be many more branches (I see number in the 1000 in the fast case and 100 in the slow case) and I finally realized (that as you already said) in the second case you have a vector of those tracks.

To have a shape similar to the “fast” case and using vector you would indeed still need to manually distribute the collection of track over individual branches:

  Trace A0;  ///  Same Trace class as the slow case.
  Trace A1;
  tree->Branch("trace_A0", &A0); // &(vector_tracks[0]) might also works as long as the vector is not resized.
  tree->Branch("trace_A1", &A1);

This structure is useful if you most often retrieve/read a sub-set of traces.

If you read all the traces all the time, I don’t why the 2nd options would be any slower than the 1st. If anything it ought to be (slightly) faster.

So we are back at needing to compare how you read both case. Could you show the concrete code?

I can’t create branches for all the tracks, because the number of tracks varries from event to event. That’s why I am asking for kind of 2D variability.

This particular case which I was comparing is reading only a few values from a few tracks. So it is optimal not to read all the tracks. The “fast” case code:

t.Draw("Traces_A100_SimSignal_X[150]*Traces_A100_SimSignal_Y[150]*Traces_A100_SimSignal_Z[150]*Traces_A10_SimSignal_X[150]*Traces_A50_SimSignal_Z[150]", "", "goff")

The “slow” case code:
t.Draw("traces[100][150].SimSignal_X*traces[100][150].SimSignal_Y*traces[100][150].SimSignal_Z*traces[10][150].SimSignal_X*traces[50][150].SimSignal_Z", "", "goff")

ok. So indeed the 2nd case reads “182/2”(ish?) more data than the 1st case (but plot the same subset).

So is there a way to construct a TTree following this kind of 2D variability, that will not have to read whole arrays and thus be faster? I was thinking about TClonesArray, because according to the documentations it can split its members.

You are mis-reading the word “split” in this context. The data is already split and the data members are in their own branches: traces.SimEfield_X, traces.SimEfield_Y , etc. and a TClonesArray will be only as good.

What you are describing is to distribute the elements of the collections across many branches. This is supported for ROOT collections other than TClonesArray (for example TObjArray or TList) *but* this is unlikely to help you since the number branches (and thus elements) is *fixed* at branch creation time (I.e. "because the number of tracks varries from event to event. ").


So another essential question is how important is to the physics use case to be able to read fast a few hard-coded-by-index traces. The example you gave:

t.Draw("Traces_A100_SimSignal_X[150]*Traces_A100_SimSignal_Y[150]*Traces_A100_SimSignal_Z[150]*Traces_A10_SimSignal_X[150]*Traces_A50_SimSignal_Z[150]", "", "goff")

seems “academic”. What is special about trace number 100 and 50 (and signal 150), especially since in some entries those would not exist (unless the variation in number of trace is extremely small)?
[I have seen those kind of expression for trigger bits where the number is fixed of elements is fixed and their index semantically important]

Isn’t a more realistic use case:

t.Draw("traces[][].SimSignal_X*traces[][].SimSignal_Y*traces[][].SimSignal_Z*traces[][].SimSignal_X*traces[][].SimSignal_Z",  "0.1 < traces[][].SimSignal_X && traces[][].SimSignal_X < 0.2", "goff")

i.e. plot some information about all the traces that meet a given criteria (Note that doing this with the “fast” solution is actually ‘hard’, you have to spell out all the trace branch names).