RDataFrame histogram name based on branch value

riley-x · May 17, 2022, 12:13am

Using RDataFrame, what is the best way to create/fill different histograms depending on the value of a branch?

For example, if I have a branch Sample with string values like “W” and “Z”, I would like to fill separate histograms “{Sample}_pt”, i.e. “W_pt” or “Z_pt”.

Right now the only idea I can come up with is to figure out all the unique values of the Sample branch and then iterate different calls to Filter for each unique value of Sample. But there doesn’t seem to be a RDataFrame::Unique function or similar, and would require doing an extra event loop.

couet · May 17, 2022, 8:15am

I think you may find the answer to your question by looking at the dataframes section of the manual. It points to many tutorials which surely cover this topic. If you still have problems finding the answer @eguiraud can surely help you.

eguiraud · May 17, 2022, 8:41am

Hi @riley-x ,
RDF does not have direct support for “categorical histogram filling” such as that (although we are thinking about adding something that makes these use cases simpler).

One other possible solution is filling a 2D histogram where the second dimension is a numerical value that depends on the value of the Sample branch. You can use a Define to go from the Sample string to such a numerical value.

Cheers,
Enrico

riley-x · May 18, 2022, 4:44am

Hi,

@eguiraud 's idea of using an extra dimension with a numerical value was an interesting solution. A post-processing step would have to “unfold” this though to the 1D histograms.

I ended up solving this by using Book in the vein of this tutorial but setting Result_t to a map of histograms.

However in retrospect (for anyone else who may stumble on this) I think an easier way would be to use Fill and create a wrapper class around a map of TH1s that takes the “Sample” branch as an additional argument.

system · June 1, 2022, 4:45am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.