Hello,
As I have mentioned in this post: Array of strings as a leaf in a branch of TTree
I am trying to find out if we can replace our HDF5 structure with ROOT and gain significant speed.
The main issue is that in our events we have a variable number of tracks. Each track consists of x, y and z arrays (and corresponding 3 arrays for something else), whose length varies between tracks (although is the same inside a single track).
I am benchmarking on a TTree with 1000 events. Initially, I’ve created 175 tracks for each event, each track 999 long, and each track was a separate branch (which is possible only in the constant tracks count case). Now I moved to the real case with variabilities. I’ve created a class for the track:
class track
{
public:
vector<float> X;
vector<float> Y;
vector<float> Z;
vector<float> aX;
vector<float> aY;
vector<float> aZ;
};
then vector<track>
, for which I create a branch in my TTree. I vary the number of tracks and the track length slightly for each event. The results of Drawing some values from this tree is ~5 times slower than from the tree with a constant number of tracks and separate branches. Average track length, etc. may be slightly larger in the variable case, but not to explain the 5 times difference. So why is it slower?
- It can’t be avoided due to variability
- The dictionary for the class is not properly generated during the write-out (as I do it in Python and there is perhaps some bug, see: On-fly dictionary generation for vector<myclass> and use as a branch - #6 by pcanal) and it can have an impact on performance
- This vector of my classes containing vectors is not the most optimal implementation
In case 1 nothing can be done. It still seems to be ~5 times faster than HDF5, but the difference is not spectacular anymore.
In case 2, I would like some confirmation that this could be the real case. Then I would need to know how to properly generate a dictionary for a vector of my classes in PyROOT…
In case 3, I would be grateful for advice. The TTrees would be read out using python, actually preferably using uproot.
I am also considering a constant number of tracks with most of them 0, but with the target of 200,000 of them, I am afraid that there would be a big hit on size and performance. And, anyway, they would have to be in some sort of an array, for browsing through 200,000 branches is rather unconvenient.
I would appreciate any comments and advice.
ROOT Version: 6.22.06
Platform: Fedora 33
Compiler: Not Provided