I’m posting to request the possibility of adding additional actions for filtering and filling histograms/graphs for each sample. While working on analysis, we usually can have many different samples with small event counts. It can more efficient to run over all of them on a single RDataFrame instead of using one for each, even if we do it in parallel with RDF::RunGraphs.
Nowadays, in order to have a filter that depends on the sample (e.g. for selecting events based on some MC truth process information), we need to define an auxiliary column that contains the sample name or id, and then input that as one of the filter’s arguments. Thus, a FilterPerSample can be of help. An option that could also be even better would be to expose the RSampleInfo object as a default RDataFrame column (in the same way as rdfentry_ and rdfslot_), maybe “rdfsample_”?
If we also process all samples on the same event loop and RDataFrame, we will probably need to create per-sample histograms and/or graphs. There are currently two non-elegant and a bit inefficient ways to do this today:
- Only for Histo1D/2D/nD/Fill: we define custom weight columns with DefinePerSample that are set to 0 for all except one sample, and then define the histograms with these custom weights.
- We manually implement a per-sample filter (following the procedure described above), and then we declare Histo1D/2D/nD/Fill() or Graph() on the filtered RDF.
The FilterPerSample doesn’t seem hard to implement, and that would go long way toward simplifying the filling of per-sample histograms and graphs.