Root file vs hdf5

Francesc · October 30, 2015, 5:34pm

Hello:

After a vivid discussion with some colleagues I would like to ask a very simple question, whats the main difference between a root file and a hdf5 file? (compression, speed, performance …)

Best regards
Y.

Danilo · November 2, 2015, 9:25pm

Hi,

I am not sure size on disk and runtime (e.g. reading, writing - not clear from the question, but crucial when designing complex experiment/simulation setups) are enough to build an exhaustive metric. Other factors must be considered: for example the programming model or the freedom guaranteed to the user in the definition of the data model.
To be factual, these are some of the elements I would suggest to take into account when thinking about ROOT as a data persistency framework (abstracting from all the other functionalities the tool offers):
[ul]
[li] Intuitive *NIX file-system like identification of written objects (“grouping” in HDF5 jargon).[/li]
[li] Possibility of row-wise storage of arbitrary data structures (or collections thereof) also compressed.[/li]
[li] Possibility of column-wise storage of arbitrary data structures (or collections thereof) also compressed across consecutive entries.[/li]
[li] Possibility of versioning written objects.[/li]
[li] Object (de)serialisation happens transparently and automatically (automated generation of “transformers” via streamlined and straightforward creation of dictionaries)[/li]
[li] Object (de)serialisation is decoupled from (reading)writing.[/li]
[li] Wide range of optimisations present by default for reading remote files over the network and to reduce access to physical disks.[/li]
[li] Possibility to read consistent data from a file being written by a different process.[/li]
[li] Guarantee to recover broken, corrupted files starting from the header alone.[/li]
[li] Rich choice of different compression algorithms and settings thereof.[/li]
[li] Self-describing format: it is always possible to read back data even if the original data structures are not available.[/li]
[li] Morphing of any class instance in any other class instance, partially automated and easily described by concise directives.[/li]
[li] Optimised and highly performant binary files but also equivalent XML and JSON representations. In-memory only file representation also available for very specialised applications.[/li][/ul]

The aforementioned features are used in production by a large users’ community and several experiments of all sizes. Among the biggest, we find the LHC ones that alone produce yearly tens of petabytes of data in ROOT format. It is starting from ROOT files that prestigious discoveries like the one of the Higgs boson take place.

Cheers,
Danilo

Francesc · November 4, 2015, 3:19pm

Thank you very much for the detailed answer!

Cheers
Y.