Dear ROOT experts,
I am creating this issue to trigger a discussion and get some feedback from you since, as far as I could tell, what follows is not yet a functionality of ROOT
or maybe this could be improved.
First let me explain a bit the motivation of creating such issue.
Within ATLAS we are exploring the possibility to read what we call “augmented files”.
Basically, instead of saving into a single tree all information for an event, we save the basic information into the main tree, and then extra b-tagging information into a second tree for only those events that satisfy b-tagging selections.
So here in the example, for the augmented file we would save:
- the events passing the standard selection into a master tree
- b-tagging related information into a side tree only if the event is passing b-tagging requirements
This implies that you may have events passing both the standard selection and the b-tagging selection
but you can also have events only passing one of those selections.
The idea behind that is it would allow to have files shared between several analyses as
some of those analyses could only be interested in reading events passing the standard selection
but other analyses would like to only read events passing both the standard+b-tagging requirements (and related b-tagging variables).
By producing augmented files those analyses could use those same files.
This would also reduce the storage space needed as we would avoid file/information duplication.
Now coming to the technical part, for analyses that would want to read only the events passing the standard+b-tagging requirements
i.e. we would need to find the events that are common to the trees. Hence it requires to:
-
Find the events that passed both selections
→ In order to do that we build an index for the trees. Each event is assigned a unique index
Hence:- if a given index can be found both in the main tree and the side tree
it means the event passed both selections - If a given index is missing in one of those tree then it did not pass one of the selection
hence we do not want to process that event in that case.
- if a given index can be found both in the main tree and the side tree
-
Run only over those common events by retrieving the corresponding information
as for instance the 2nd common event could correspond to the entry=5 in the main tree and the entry=20 in the side tree.
For that we wanted to exploit the friend tree relationship by setting the side tree as a friend of the main tree.
But we have some technical constraints and ROOT functionalities (as far as I could tell) seems not optimal.
The “issue” is from a ROOT user perspective, if you want to read those common events
- you first need to find the number of common events
as TTree::GetEntries() returns the number total number of events in the tree. - More problematic you also need to store (at the very least) the entry of the main tree that are corresponding to the common events.
E.g. if the 2nd common event is the entry=5 in the main tree and the entry=20 in the side tree.
You would need either to call
mainTree->GetEntry(5)
see ROOT: TTreeIndex Class Reference
to read that 2nd event
Or if you are reading information branch by branch (which is our case within the central code of ATLAS, for speed reason) you would need to call
- mainTreeBranch->GetEntry(5) for branches that belongs to the main tree
- sideTreeBranch->GetEntry(20) for branches that belongs to the side tree
Hence not really relying on the friend tree relationship + requiring you to get by yourself the entries for each tree and do the matching by yourself.
We managed to implement such thing above but ideally it could be interesting that ROOT provides such functionality.
I.e in order to be able to read the common event one would just need to call
mainTree->onlyRunOverCommonEvents();
then
mainTree->GetEntries() would return only the number of common events
and then if wanting to retieve information of the 2nd common event one would just need to call
- mainTreeBranch->GetEntry(2)
- sideTreeBranch->GetEntry(2)
which would greatly simplify things.
Do you think such functionality could be developed ?
Any feedback is also very welcome
Many thanks in advance,
Best regards,
Romain Bouquet