I have some initial raw Events stored in tree T1. I run reconstruction, which gives results that I would like to store in tree T2. However, only some events from T1 manage to get reconstructed. I would like to go through T2 (reconstructed) events, reading corresponding branches from T1. I guess this is quite a standard analysis scenario, but I would like your advice, what is the best approach. Currently two things come to my mind:
For each entry in T1 create an entry in T2, but fill it with 0 (or something similar) in case of failed reconstruction. Later use T1 as a friend of T2.
Create a TTreeIndex for T1 with (run,event). Add T1 as a friend to T2, then T2.GetEntry(i) should read appropriate Entry in T1, and thus standard Entry$=x in T1 does not have to be the same Entry$=x in T2.
I think method 2 looks most tempting if I understood it correctly. Did I? Or maybe there is a better approach?
I agree on the options! Which one to pick depends on the fraction if reconstructible events: if it’s close to 100% then the overhead of dealing with the TTreeIndex (both for you and internally for TTree) might not be worth it. The cost of the first option also depends on your TTree format. Maybe you could share the output of tree->Print() of your friend tree, to see how easy it is to store “this is empty”, or how to best modify the tree layout?
I can’t share the TTree Print() output yet, because the TTrees are under construction. Everything is related to this post:
and we’ve (finally?) decided to try with ROOT instead of HDF5. Good support on this forum is one of the reasons
The initial idea with HDF5 was to add variables to Events as they passed reconstruction stages. This is not optimal (close to impossible) with ROOT TTrees, but the “friends” approach is more appealing for me. Most likely the fraction of reconstructible events during the prototype phase of the experiment will be very small. However, it could change significantly in next phases that will come in several years. Is there any estimate what is the overhead of using TTreeIndex?
One part I never fully understood was whether you needed to access all the tracks at the same time. I assume so, but I know of experiments where that’s not needed.
IIUC the use of TTreeIndex will be part of some framework - which makes helps, as that means users don’t have to deal with it themselves (cannot forget etc).
@pcanal is the best bet for giving an estimate on the performance cost of TTreeIndex.
I wouldn’t worry about the optimal part too much, if you have a chance to rewrite your data once a while, i.e. update the schema. That would allow you to switch to whatever other option at a later stage, with the goal of reducing storage and CPU cost.
In the reconstruction phase, probably all tracks will be read at the same time. However, in debugging phase it is likely that we may need just one track (one instrument) through events. Still, as discussed in the linked topic, this is impossible with a variable number of tracks and TTree, which will always read all the tracks. I came to peace with that.
At some point yes. Initially just helping with reading the same Event from all the processing stages TTrees with GetEntry() or Scan() or Draw().
A chance of rewriting data always exists, but it is not a thing that we would like to do. It requires a lot of manpower and can cause trouble. However, we’ll see, and we’ll see how important the I/O performance really is for us. It just hasn’t clicked in my mind, that the TTreeIndex has some performance impact.