Hi,
is there an easy way to use the Filter
functionality to select events with, for example, the same event number? I would like to use RDF to filter out duplicate events, but I can’t figure out how to do this in an easy way.
best wishes
Louie
Hi,
is there an easy way to use the Filter
functionality to select events with, for example, the same event number? I would like to use RDF to filter out duplicate events, but I can’t figure out how to do this in an easy way.
best wishes
Louie
Welcome to the ROOT forum!
I’m sure @eguiraud can give you some hints
Thanks ! Yes, that would be great. I presume that the table-like format of RDF would be able to handle a “sort unique” type of operation, but I did not see anything like that in the documentation. So I thought I’d check with the experts before trying something complicated. Hints appreciated!
Hi @LouieC ,
and welcome to the ROOT forum!
RDataFrame does not provide such an operation because the trivial implementation of a full sort+unique requires all data to be in memory, and we typically deal with larger-than memory datasets.
Depending on your actual usecase there are a number of ways you can go about this. For example you can have a stateful (thread-safe) Filter
function that returns true
if it has never seen an event number and false
otherwise.
Cheers,
Enrico
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.
An example of a stateful thread-safe filter is now available at A thread-safe stateful Filter for RDataFrame · GitHub