Let’s say I have two TTrees, with same names and branches. They are produced starting from a different source (e.g. two data streams) but events can overlap, meaning that I can find the same event in both. Total number of entries per tree is of course different.
There is a branch called eventID that is uniquely identifying the events in both ttees. The user can guarantee that the branches of the two trees are identical for the same eventID but cannot guarantee that entries are sorted by eventID.
Is there any existing method for merging the two ttrees using the eventID as a key? Or even just a sort of AddFriend mechanism with this handle? Preserving the info of the “origin” of the event would be a bonus but I suspect the task is already challenging as it is.
Thanks for the post, this is an interesting and not so easy problem, for which a standard and performant solution might not exist (and the reason is deeply rooted in the nature of the operation performed, not really in ROOT’s columnar IO).
Is eventID an integer value? In that case one could approach the problem building an index based on that column to loop on identical events:
// Set up the 2 trees and the way to read them
tree1->BuildIndex("eventID");
tree2->BuildIndex("eventID");
for (auto id : eventIDCollection) {
tree1->GetEntryWithIndex(id);
tree2->GetEntryWithIndex(id);
// Custom operations for the merging and writing the 3rd TTree
}
Alternatively, if everything fits in memory, you can resort to RDataFrame (see this post Sort a RDataFrame)