I’m trying to preform some analysis cuts using RDataFrame from root files generated by Delphes. But I seem to encounter a problem with reading particles data from arrays. As i understand, in a delphes file, an event may contain more than one same particle, like electron or muon. If i do
I get a histogram describing the distribution of muon numbers per event. For a particular event, that has say 2 muons, the Muon.PT branch contains an array that stores the pT of the two muons. When i try to the read the pT distrubution of the muons, with:
hist = df.Histo1D("Muon.PT")
hist.Draw()
I get a distribution for all muons in the file. But if i chose to look for the leading muons in pT, or subleading as follows:
The reason is that Histo1D accepts a column name and not an expression, so you first need to Define a new column (which can use arbitrary expressions, including accessing other columns) and then you can do operations on the newly defined column.
Think about Define as “assigning a variable” that is then available for all further operations
Thanks for reaching out! I absolutely agree with @silverweed , that is the right way to go. Just to give further context, allowing Histo1D("Muon.PT[0]") would be equivalent to allowing Histo1D(“run_my_very_complicated_function_that_may_return_a_value_not_acceptable_for_a_histogram”), so RDataFrame really prefers making this distinction between a column (either on-disk or defined) and an expression, the latter being only usable in the parts of API that transform data (e.g. Define or Filter) but not in the parts of the API that declare a result (e.g. Histo1D, Mean, Sum etc.).