RDataFrame feature request: per sample Histo/Graph/Fill and Filter

Jean_Beaucamp · April 20, 2023, 4:01am

Hi all,

I’m posting to request the possibility of adding additional actions for filtering and filling histograms/graphs for each sample. While working on analysis, we usually can have many different samples with small event counts. It can more efficient to run over all of them on a single RDataFrame instead of using one for each, even if we do it in parallel with RDF::RunGraphs.

Nowadays, in order to have a filter that depends on the sample (e.g. for selecting events based on some MC truth process information), we need to define an auxiliary column that contains the sample name or id, and then input that as one of the filter’s arguments. Thus, a FilterPerSample can be of help. An option that could also be even better would be to expose the RSampleInfo object as a default RDataFrame column (in the same way as rdfentry_ and rdfslot_), maybe “rdfsample_”?

If we also process all samples on the same event loop and RDataFrame, we will probably need to create per-sample histograms and/or graphs. There are currently two non-elegant and a bit inefficient ways to do this today:

Only for Histo1D/2D/nD/Fill: we define custom weight columns with DefinePerSample that are set to 0 for all except one sample, and then define the histograms with these custom weights.
We manually implement a per-sample filter (following the procedure described above), and then we declare Histo1D/2D/nD/Fill() or Graph() on the filtered RDF.

The FilterPerSample doesn’t seem hard to implement, and that would go long way toward simplifying the filling of per-sample histograms and graphs.

Thanks!

jalopezg · April 20, 2023, 12:33pm

Hi @Jean_Beaucamp,

Welcome back to the ROOT forum and thanks for your suggestion! Let’s add @vpadulan and @eguiraud to this topic so that they are aware.

Cheers,
J.

system · May 4, 2023, 12:34pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

eguiraud · August 8, 2023, 6:32pm

Hi @Jean_Beaucamp ,

Thank you for posting this and huge apologies for the lack of reply.
The feature request makes sense to me (although I would like to know what someone with more of a physics background like @mczurylo or @Axel thinks).

I do not know if/when this could be implemented as, as you say, there are ways to get this done (albeit clunky ones) so other tasks might take priority, but it might be good to convert this in a GitHub issue so it does not get lost.

Cheers and apologies again,
Enrico

mczurylo · August 9, 2023, 8:19am

Hi @Jean_Beaucamp and @eguiraud,

and again sorry for the very late replies from our side. I agree that the FilterPerSample would be a great simplification and it would makes things go smoother from the physics analysis perspective. I’ve added it to the list on GitHub: [DF] Add FilterPerSample feature · Issue #13422 · root-project/root · GitHub.

Cheers,
Marta

eguiraud · August 21, 2023, 6:00am

This topic was automatically closed after 12 days. New replies are no longer allowed.