Arrays of RDataFrame filters

Hi -

I have a large dataframe that I would like to split up into small pieces (several thousand, based on an index in each event) by applying a Filter and then making a Snapshot to write out several useful branches to individual files.

I believe it’s most efficient to set up several thousand filters first and then loop over them and invoke “Snapshot” on each of them, but it wasn’t obvious how to automate this. I tried to set up a vector with the return from Filter(): std::vector< ROOT::RDF::RInterface<ROOT::Detail::RDF::RJittedFilter, void> > v but I wasn’t able to successfully add successive return references to it. Is there an obvious trick for this?

  • Peter

Hello @steinber ,

see this section of the users guide and below.

It would be even better to run a single loop over data that performs all of the Snapshots you need.
For that, you should mark the Snapshots as lazy (see the RSnapshotOptions argument in the Snapshot docs).
Maybe the simplest thing to do at that point is have a vector of results of Snapshot calls, a vector<ROOT::RDF::RResultPtr<RInterface<RLoopManager>>>.

Cheers,
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.