Many branches vs. many trees

I have the necessity to let various independent data analysis object to write their event description objects in the same file. I see two ways of doing this:

  1. Create a single tree in the output file, each data analysis object creates its own branch in this tree, in the event loop each data analysis object updates its write buffers and then TTree::fill is called once per event by a centralized tree management service

  2. Each data analysis object creates its own tree in the output file and manages it exclusively (create branch and call TTree::Fill once per event)

I am concerned about the performance of reading a file built in one of the above ways. According to this twiki page solution 2 is going to produce files with slow readout. But it’s still appealing to me because it allows for a looser coupling of data analysis objects. I think that a final answer like “you’ll increase read time by x%” does not exist and that the difference of the two solutions above depends strongly on the specific content of the file, but nevertheless I’d like to know if there is any general consideration that can be done to roughly evaluate when the performance loss of solution 2 will be large.
Thanks.

Hi,

It depends on your main read pattern. If you are going to mostly read only from one of the two TTree at a time then the 2 trees solution is more efficient. If you are going to mostly always read both TTree in sync then the 1 tree solution is more efficient. The difference is somewhat in a second order effect in how efficient it is to read from the disk (because the data is close by in the files or not) or the the amount of extra meta data (2 TTree objects instead of one).

Cheers,
Philippe.

Hi Philippe, thanks for the reply. I would use the two trees to separate data of the same event, so I will almost always have to read from both trees when reading an event. So I’ll go with the single tree, multiple branch pattern.
Thanks again.