Improvement of friend tree functionnalities

Dear ROOT experts,

I am creating this issue to trigger a discussion and get some feedback from you since, as far as I could tell, what follows is not yet a functionality of ROOT
or maybe this could be improved.

First let me explain a bit the motivation of creating such issue.
Within ATLAS we are exploring the possibility to read what we call “augmented files”.
Basically, instead of saving into a single tree all information for an event, we save the basic information into the main tree, and then extra b-tagging information into a second tree for only those events that satisfy b-tagging selections.

So here in the example, for the augmented file we would save:

  • the events passing the standard selection into a master tree
  • b-tagging related information into a side tree only if the event is passing b-tagging requirements

This implies that you may have events passing both the standard selection and the b-tagging selection
but you can also have events only passing one of those selections.

The idea behind that is it would allow to have files shared between several analyses as
some of those analyses could only be interested in reading events passing the standard selection
but other analyses would like to only read events passing both the standard+b-tagging requirements (and related b-tagging variables).
By producing augmented files those analyses could use those same files.
This would also reduce the storage space needed as we would avoid file/information duplication.

Now coming to the technical part, for analyses that would want to read only the events passing the standard+b-tagging requirements
i.e. we would need to find the events that are common to the trees. Hence it requires to:

  • Find the events that passed both selections
    → In order to do that we build an index for the trees. Each event is assigned a unique index
    Hence:

    • if a given index can be found both in the main tree and the side tree
      it means the event passed both selections
    • If a given index is missing in one of those tree then it did not pass one of the selection
      hence we do not want to process that event in that case.
  • Run only over those common events by retrieving the corresponding information
    as for instance the 2nd common event could correspond to the entry=5 in the main tree and the entry=20 in the side tree.

For that we wanted to exploit the friend tree relationship by setting the side tree as a friend of the main tree.
But we have some technical constraints and ROOT functionalities (as far as I could tell) seems not optimal.

The “issue” is from a ROOT user perspective, if you want to read those common events

  • you first need to find the number of common events
    as TTree::GetEntries() returns the number total number of events in the tree.
  • More problematic you also need to store (at the very least) the entry of the main tree that are corresponding to the common events.

E.g. if the 2nd common event is the entry=5 in the main tree and the entry=20 in the side tree.
You would need either to call
mainTree->GetEntry(5)
see ROOT: TTreeIndex Class Reference
to read that 2nd event

Or if you are reading information branch by branch (which is our case within the central code of ATLAS, for speed reason) you would need to call

  • mainTreeBranch->GetEntry(5) for branches that belongs to the main tree
  • sideTreeBranch->GetEntry(20) for branches that belongs to the side tree
    Hence not really relying on the friend tree relationship + requiring you to get by yourself the entries for each tree and do the matching by yourself.

We managed to implement such thing above but ideally it could be interesting that ROOT provides such functionality.

I.e in order to be able to read the common event one would just need to call
mainTree->onlyRunOverCommonEvents();
then
mainTree->GetEntries() would return only the number of common events
and then if wanting to retieve information of the 2nd common event one would just need to call

  • mainTreeBranch->GetEntry(2)
  • sideTreeBranch->GetEntry(2)
    which would greatly simplify things.

Do you think such functionality could be developed ?
Any feedback is also very welcome

Many thanks in advance,
Best regards,
Romain Bouquet

I guess @pcanal can give his input.

In the long term, a GitHub issue might be more appropriate for this kind
of request. But I guess, right now, the forum is good enough. Let’s see how it will evolve.

This can be written as:

// The next line will:
//    a. The load the proper file if mainTree or sideTree are TChain
//    b.  Set the cursor ( `ReadEntry` ) for both the main tree and the side tree 
//         (using the information from the TTreeIndex).
//    c. does NOT bring into memory the data from any of branches ( `TBranch::GetEntry` needs to be called next).
Long64_t localentry = mainTree->LoadTree(5); 
// Then you can load the data with:
mainTreeBranch->GetEntry(localentry);
sideTreeBranch->GetEntry( sideTreeBranch->GetTree()->GetReadEntry() ) ;

The mechanism that we have for this is TEntryList. See ROOT: TEntryList Class Reference

Once you have created the TEntryList (or read it from the TFile), you can use with:

       chain->SetEntryList(myelist);

        for (Long64_t entry=start;entry < end;entry++) {
           Long64_t entryNumber = chain->GetEntryNumber(entry);
           if (entryNumber < 0) break;
           Long64_t localEntry = chain->LoadTree(entryNumber);
           if (localEntry < 0) break;
           etc ... (see previous comment)

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.