Best way to write replicated member objects in a TFile

I have an event class which contains a pointer to a housekeeping member object describing the detector conditions for that event. The housekeeping information is slowly varying, so large bunches of events point to the same housekeeping object. I want put the event objects in a tree in a file but I don’t want to waste disk space writing the same housekeeping event many times. Is there in ROOT something like a smart reference/pointer class tailored for this kind of necessity? Something that I can embed in my event class to point at the housekeeping object, and which avoids to reply the pointed object each time I write an event.
Thanks

Hi Nicola,

the disk space should not be an issue as the compression automatically takes care of entities which are identical for many events.

Cheers,
Danilo

Thanks Danilo, however I experience the opposite. The housekeeping objects are very large, and writing one of them for each event produces a 108 MB file. Making the event class member pointer to the housekeeping event transient, writing the housekeeping events on a separate tree and adding to the event class an integer member representing the index of the corresponding housekeeping event in the housekeeping results in only 3.4 MB file. It is a bit involved, needs to manually set the pointers to housekeeping event during readout and generates problems when chaining trees, but it saves a lot of disk space.
Maybe I’m doing something wrong which invalidates the automatic compression of identical objects. I will check with a simpler test code but up to now these are my findings.

Hi Nicola,

a simple reproducer would be welcome.
Are you altering the buffer sizes or split level?
What is the layout of the condition class? Could you share the header?

Cheers,
Danilo

I don’t change any of the default options in the constructor of TTree, so I think that the split level is 99. My classes are rather big and not very well written so at first shot I’ll try to reproduce the issue with a simpler test code. I’ll report back as soon as I get some result.

I’ve written and attached a simple script reproducing the observed behavior. I hope it is clear enough to be self-explaining. The size of the file obtained using the automatic compression is 4.1 MB, while with “manual compression” using a transient pointer it is 12 KB.
TestIdenticalCompr.cpp (1.73 KB)

Hi Nicola,

thanks for the code.
The most optimal solution to this issue might really depend on your data.
As discussed previously, there is the option to merge the non event data and the event data. If the defaults of the IO are adopted this was verified not to be profitable. What one can still do is to test if a bigger basket size (TBranch::SetBasketSize - default is 32k) helps compressing the size on disk. This will have an effect only if the events are time ordered, i.e. if the interval of validity does not change randomly event by event but it is the same for many subsequent events. You can increase the basket size to lump together in the same buffer more identical calibration objects and let the compression algorithm do the rest.
If all of the above does not yield results, I think that the separation of the event data from the calibration one is indeed a good way to proceed. The implementation you described in your previous should be already quite efficient.

Cheers,
Danilo

Increasing the basket size for the branch containing the Event objects leads to some improvements. I’ve been able to shrink the file size down to 1.5 MB wit a basket size of about 10 MB (branch->SetBasketSize(10000000);), however it is still far from the 12 KB of the manually optimized file. I’ll stick with the manual optimization, thanks Danilo for the support.