Double32_t and Float16_t

Hi,

I would like to store my vectors of floats in a TTree in lower precision to reduce file size. I’ve looked at Float16_t and Double32_t. However, I am not certain how mature these types are, because there is a difference between them, at least according to TTree documentation, between ROOT 6.30 and 6.36. In the former, the Float16_t is claimed to be 24 bit, in the latter 21 bit. In the former, the Double32_t is said to be 24 bits, in the latter 32 bits. Thus, I am not certain if a vector holding these values would be readable by both ROOT 6.30 and 6.36.

In addition, is there any documentation that would allow me to understand what is the difference between Float16_t and Doubel32_t?

Hi @LeWhoo,

Thank you for your question. Let me add our expert @pcanal in the loop.

Cheers,
Dev

The actual behavior of Float16_t and Double32_t has not changed in many years. Recent updates to the documentation (if I recall correctly) were ‘just’ trying to improve the qualify of the description. The main difference between Float16_t and Double32_t is simply that the former is a float in memory while the later is a double in memory. The intent is for the documentation to be attach to TBufferFile::WriteFloat16 and TBufferFile::WriteDouble32.

The default for Double32_t is indeed to store it in 32bits (8 bits for the exponent and 24 bits for the mantissa). (and the default for Float16_t is 8+13)

Thanks!

There is one open issue with these if you use customized ranges, see

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Hi @pcanal, thanks for the clarification.
However, I think the documentation is still very unclear:

If one looks at WriteDouble32, it says:

         //In this case we truncate the mantissa to nbits and we stream
         //the exponent as a UChar_t and the mantissa as a UShort_t.

So, if I understand correctly, a Double32_t is written as three bytes (24 bits) into file.
This does not match with:

Could you clarify?

I do not find this clear, either. In WriteFloat16, it says also:


      //In this case we truncate the mantissa to nbits and we stream
      //the exponent as a UChar_t and the mantissa as a UShort_t.

So, it looks to me that both Float16 and Double32 are stored to disk as 24-bits.

Or where you speaking the whole time as storing it “to memory” ?

It would be great if we could clarify these things, see my PR: [skip-ci][core] Discourage in the docu the use of old-style data types from RtypesCore by ferdymercury · Pull Request #19283 · root-project/root · GitHub

You are correct in that “subset” of case where the user explicitly requested to store in even less bits than usual then it is stored as 24 bits with the extra dropped bits being still saved by compression (since they are zeros). The default for Double32_t is different than this case and is indeed stored in 32 bits.

I do not find this clear, either. In WriteFloat16, it says also:

This is one of the non-default subcase …

1 Like