My experimental DAQ has 24 bits of precision, and currently stores the values on the hard disk as a signed 32 bit integer where the least 8 bits are used for error checking.
For calculations an eight bit shift is used, int a = (stored_value>>8), to remove those error checking bits.
For the first level of processed data it would take a third less space on disk if I could store 24 bit integers in my TTrees.
I do not think this is possible, the builtin integer types of C++ are all powers of 2 and I suppose this is true for ROOT as well although I tend to not use the ROOT builtin types.
Why not separate the 8 least bits into an 8 bit integer, and the upper 16 bits as an 16 bit integer? Granted this will still give you some overhead since you need to store two branches instead of one, but it might still reduce the size of your trees on disk.
ROOT trees can be compressed (see TFile::SetCompressionSettings). So my advice would be to not worry about this at all. You don’t save 1/3 of disk space on a compressed tree, especially when 8 bits are always set to zero.
Doing a comparision of the compression levels on a sub set of my data:
level | file size(mb) | procesing time
0 | 417 | ~10seconds- no compression
1 | 163 | ~20seconds
2 | 163 | ~20seconds
5 | 160 | noticibly but not significantly longer than 2
9 | 129 | 10-15minutes- maximum compression
So it looks like the default compression level of 1 is nearly as good as going all the way to 9, tolerably slower than 0, and knocks 60% off the file size.
So it looks like the built in compression is working well, and reading up on it, splitting things over more branches and leaves may decrease the achievable compression.
Of note off the start of the TTree->Print() is:
******************************************************************************
*Tree :Asym-r38314: Asym-r38314 *
*Entries : 12498 : Total = 436924908 bytes File Size = 170386008 *
“Total = 436924908” bytes is the uncompressed data size
"File Size = 170386008" is as it says, the size on disk
"Tree compression factor = 1.00 " is who knows what that led me to thinking the TTree was not getting compressed
This is done in Root 5.34, so Compression Factor could be more meaningful in later versions
Which compression algorithm did you test? Most of the time I am using level 3 with kLZMA, i.e. setting 203. For my datasets kLZMA compresses much better than kZLIB, so you might want to try it as well.
I have not had time to experiment with the compression setting, and have to move on due to time constraints.
From the ECompressionAlgorithm in Compression.h there looks to be 4 options
enum ECompressionAlgorithm { kUseGlobalSetting,
kZLIB,
kLZMA,
kOldCompressionAlgo,
// if adding new algorithm types,
// keep this enum value last
kUndefinedCompressionAlgorithm
};
Using file->SetCompressionSettings(100*algo+level) as file->SetCompressionSettings(203) takes option number 2, kLZMA, from the enum list, and sets the compression level as 3 if I am reading it all correctly.