Accessing entry information using RDataFrame

Dear ROOT Experts,

I’m using RDataFrame to populate a TEntryList but I’m not sure how to access the Long64_t entry in addition to the branches present in the TTree.

I’ve defined my lambda as:

   auto fillEntryList = [&](Long64_t entry, UInt_t run, ULong64_t event){
        // "run" and "event are "branches" within a TChain
        // executes TEntryList::Enter
   };

While executing a Foreach:

   d.Foreach(fillEntryList,{"run","event"});

Which understandably throws the following error:

terminate called after throwing an instance of 'std::runtime_error'
  what():  3 column names are required but 2 were provided: "run", "event".

How do I need to modify my lambda (or Foreach call) in such a way that is able to capture the current entry that’s being processed (as in TTreeReader::GetCurrentEntry())?

Thank you!

   d.Foreach(fillEntryList,{"rdfentry_","run","event"});

According to https://root.cern.ch/doc/master/df001__introduction_8C_source.html

So you found the solution?

Yep, that’s it. Note however, as per the docs, that in multi-thread runs over multiple trees in a TChain rdfentry_ will not always correspond to the global entry number in the chain, so the TEntryList filling is only safe for single-thread runs (without EnableImplicitMT). I hope to remove this limitation in the future.

1 Like

If I still want to use multi-threading, is there any way around it?

Hi @amvargash ,

The auto-generated rdfentry_ column has the limitation mentioned above (at least for now), but nothing stops you from having an actual column (maybe in a friend tree) that contains the desired event number: as a workaround, just once, you can run a single-thread program that produces a column with the global chain event number and store it in a tree:

ROOT.RDataFrame(original_chain.GetEntries())\
  .Alias("GlobalEventNumber", "rdfentry_")\
  .Snapshot("event_numbers", "event_numbers.root", ["GlobalEventNumber"])

and for any further processing (including multi-thread processing) you can now add event_numbers as a friend of the main chain and you will have the column GlobalEventNumber with the right value for every event.

Cheers,
Enrico

@eguiraud You assume that exactly the same set of ROOT files will always be used, and they will be loaded / processed in exactly the same order. If the chain changes in any way, the stored GlobalEventNumber will be meaningless.

No, I am saying that as a workaround for this limitation (that we want to lift in the future) you can add this extra step to the analysis pipeline, which as you correctly point out will have to be re-executed whenever what is in the input TChain changes (“once” above is “once per input dataset”).