Is there any way to use TEntryList with RDataFrame?

I recently discovered TEntryList, and it looks like a useful way to record selections of events. Is there any way to use a TEntryList with RDataFrame? Perhaps as a Filter?


ROOT Version: Not Provided
Platform: Not Provided
Compiler: Not Provided


Hi,
yes, pass the TEntryList to a TChain constructor, and pass that TChain to the RDataFrame constructor.

Things are not that smooth in the other direction: unfortunately at present there is no built-in way for a RDataFrame to build a TEntryList with the results of a Filter (but you can code the logic yourself and put it in a Foreach, of course).

Hope this helps,
Enrico

Could you elaborate what you mean by this? When I did, e.g.,

ch.SetEntryList(el)
df = ROOT.RDataFrame(ch)
h = df.Histo1D('X_M_ConAll')

h.GetEntries() == ch.GetEntries() == df.Count().GetValue(), which is not the behavior one usually expects with SetEntryList. (ch.GetEntries() == 349, el.GetEntries() == 144.)

I should also point out that an initial call to df.Count() produces:

Warning in <TTreeReader::SetEntryBase()>: The TTree / TChain has an associated TEntryList. TTreeReader ignores TEntryLists unless you construct the TTreeReader passing a TEntryList.

Hi,
sorry, it looks like I was wrong, that’s exactly what I meant and that warning is saying that the TEntryList is not propagated as I expected it to be.

So the feature is not there (yet).
I guess it’s not a bug since we don’t advertise usage of TEntryList with RDataFrame, but let me get back to you on this soon.

Ah okay. Thanks for the insight.

By the by, it’d be great to see the utility of TEntryList further expanded, as it seems a very useful feature from a memory management p.o.v. It’d be lovely, for instance, to be able to do:

ch.SetEntryList(elist)
for evt in ch:
    ...

where evt is only entries corresponding to elist, instead of the current, rather clunky:

ch.SetEntryList(elist)
treenum = array('i', [0])
for ievt in xrange(elist.GetN()):
    good = elist.GetEntryAndTree(ievt, treenum)
    good += ch.GetTreeOffset()[treenum[0]]
    ch.GetEntry(good)
    ...

Probably not everyone would love this behavior; perhaps a master switch, e.g., UseEntryList(), a la EnableImplicitMT()

Uhm, @pcanal correct me if I’m wrong, but should the first snippet work?

ch.SetEntryList(elist)
for evt in ch:
   ...

…or at least its C++ counterpart?

Sorta. In direct TTree usage you indeed need to use elist.GetEntryAndTree or TTree::GetEntryNumber.

With TTreeReader, when it is passed the TEntryList, then yes, the iteration is only over the entry in the list.

The Python interface/looper could indeed be enhanced to respect the entry list.

So it does! Thank you. Now if only RDataFrame took a TTreeReader in its constructor.

Hi,
I’m afraid it cannot, it needs the data as “raw” as possible in order to manipulate it e.g. to do multi-thread reading.
Or better it could take a TTreeReader of course, but internally it would need to unpack the contents anyway.

I’m working on a patch for RDF+TChain with TEntryList.

Hi,
sooo this PR fixes RDF+TEntryList in the single-thread case.
As a new feature, it will be available in the next development release (v6.17). Support for multi-thread event loops and TEntryLists requires a bit more work, but it should also make it for v6.17 which is still quite a bit away.

In the meantime, if you have access to nightly builds of ROOT on lxplus or otherwise, you can try out the feature as soon as that PR is merged.

Cheers,
Enrico

Thank you much for your swift action! So the master branch should now allow for

ttr = TTreeReader(...)
df = RDataFrame(ttr)

?

Hi,
I’m afraid not, as per my message above the patch allows TChains/TTrees with TEntryLists to be treated correctly (only for single-thread runs, for now)

Looks like it works! Thanks again.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.