Input parameters to ROOT memory location

vpascuzz · February 13, 2017, 12:45pm

Hi,

Goal: Retrieve object(s) from a ROOT file as fast as possible.

Idea: Given a set of parameters, {p_i}, I will grab the corresponding object from a ROOT file using a memory-map from the {p_i} to the object. Since ROOT files are binary files, it is possible to know the memory location and size of each mapped object. I can find the memory location and size of the objects in the ROOT CLI.

Is this idea suitable? Is it the best/fastest solution? If not, can you recommend a better alternative?

Thanks in advance,
Vince.

pcanal · February 13, 2017, 4:18pm

Hi Vince,

Since ROOT files are binary files,

Yes, it is a platform independent binary format that is also usually compressed and contains meta data to allow for schema evolution and self description. This means that to use the bytes out of the file, one need to (optionally) uncompressed the byte and to byte-swap them (when running on linux) and copy them in the right memory offset.

find the memory location and size of the objects in the ROOT CLI.

Yes, once uncompressed there are means to find the right place in the byte stream.

Goal: Retrieve object(s) from a ROOT file as fast as possible.

What is your context? What are you performance requirements.

When trying to get the best possible speed out of ROOT I/O, you should consider not compressing the data, using a TTree and keeping your data model as simple as possible (avoiding complex containers, pointers, inheritance and object nesting).

Cheers,
Philippe.

vpascuzz · February 13, 2017, 11:33pm

Hi Philippe,

Thanks very much for the prompt reply.

The idea is to copy the entire contents (~50 MB) of objects derived from TObjects in a ROOT file into memory upon loading our simulation. The directory structure inside the ROOT file is somewhat tedious, e.g.

Root1
—Particle
------Energy
---------Position
------------dir_1
------------…
------------dir_n
Root2
—Particle
------Shape
---------Position
------------dir_1
------------…
------------dir_n

Would it be advisable to load the objects into arrays and map the {p_i} onto the corresponding elements? I think based on the structure of the ROOT file, it could be painful to navigate.

Best,
Vince.

pcanal · February 14, 2017, 8:56pm

Hi,

My apologies, I do not understand your sketch (what is Root[12], why are the ‘directory’ at the bottom etc. An alternative might be to provide a sample file.

One genuine question is how is the data organized before you write it to the ROOT file?

Cheers,
Philippe.

vpascuzz · February 15, 2017, 12:40pm

Hi Philippe,

“Root[1,2]” are the “root” directories (top-level)…I can understand the confusion. Let me try again. my_file.root contains:

top-level-dir1
—ParticleDir
------EnergyDir
---------PositionDir
------------dir_1
------------…
------------dir_n
top-level-dir2
—ParticleDir
------ShapeDir
---------PositionDir
------------dir_1
------------…
------------dir_n

Is that more clear?

pcanal · February 15, 2017, 3:25pm

Hi Vince,

Yes, now I understand and it seems (very) likely to be an inefficient way to store the data. Unless you routinely (i.e. mostly) read just ‘one’ of the dir_n, it is likely that storing the data in a TTree (rather than a set of subdirectory) will be much better.

To understand better I would need to know how is the data organized in memory before you write it to the ROOT file?

Cheers,
Philippe.