Filling multiple histograms while iterating on a RDataFrame

Hi,

I have a python script loading a RDataFrame with data from TTree with several variables I need to plot in histograms grouped according to one special candidate categorization variable (a string actually).

I can of course iterate on each category, use “Filter” to only keep the matching candidates and then Histo1D to return my histogram, but this hardly seems the most efficient to do this.

I could create one histogram per category and use “Foreach” to iterate once over the ntuple and fill the correct histogram depending on the category, but as far as I understand this does not work in pyroot.
Have I missed something ? Would there be another way to do this ?

Many thanks !
Ben

If you want to make a 1D plot and you have Many histograms to do it might be a good idea to first convert your categories into an index (0,1,2,3…) And then make a single TH2D with y axis being the categories. Once histograms are done you can just project . This would also scale for 2D gistos casted to 3D I guess.

That’s a nice trick I did in the past and was very efficient.

However this would work if your preselection is common and the n categories are each exclusive from the other.

In any case I don’t see inefficient risks of creating n Categories filters nodes and just plot variables out of each of them, the important is that you ensure you run the event loop only once.

1 Like

I ended up converting the data and using a TH2D indeed.
I was not clear on how to make sure the event loop is run only once though.

Hi,
filling a single TH2D should run faster than using N Filters + N TH1Ds: in the first case, for every event, there is no selection evaluated, just a fill; in the second case, N filters are evaluated + 1 fill. Whether the performance difference matters more than the ergonomics of the program depends on your usecase.

To verify this, you can check the output of df.GetNRuns().
In general the rule is that an event loop is started whenever one of the results is accessed for the first time, and the loop produces/fills all results requested until that point.

Let us know if you need further clarifications.
Cheers,
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.