Issue with rdfentry_

Adam_Kobert · September 23, 2021, 5:56pm

In order to only look at 10% of my data I have used the filter

Rdf = Rdf.Filter("rdfentry_ % 10 == 0")

When I check the number of events before and after this filter is applied I see that 10% of events are maintained. However when I create plots using this Rdf (with no further filters applied), different plots will have different numbers of events (none of which match the event count after the rdfentry_ filter was applied). I was under the impression that rdfentry_ is a column in the Rdf which contains the entry number, am I misinterpreting this?

If rdfentry_ cannot be used in this manner, what is the best way to only keep 10% of the events in the Chain?

ROOT Version: 6.18/04
Python 2.7.15

RENATO_QUAGLIANI · September 23, 2021, 5:58pm

I think you want to use df.Range(nEntriesConsider).Not sure why rdfentry doesn’t work in this case but Range should. Maybe when you fill histograms you have different counters because of over/underflow?

etejedor · September 24, 2021, 8:22am

Also you might want to consider upgrading your ROOT version, since RDataFrame has gone through big improvements since 6.18. This works for me with current master for example:

>>> import ROOT
>>> rdf = ROOT.RDataFrame(100)
>>> rdf.Filter("rdfentry_ % 10 == 0").Count().GetValue()
10

eguiraud · September 27, 2021, 3:47pm

No you are not, that’s correct.
Something else is happening here but we’d need a reproducer to figure out what exactly (unless a more recent ROOT version just works – you can try in a docker container, on lxplus or in a conda environment, for example, see Installing ROOT - ROOT).

Cheers,
Enrico

system · October 11, 2021, 3:48pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.