I am trying to access the photon with maximum momentum from a root data frame. Can someone explain why this command does not accomplish this? I am receiving errors that the filter can not be interpreted as a boolean.
df = df.Alias("Photon0", "Photon#0.index")
df = df.Define("photons_all", "FCCAnalyses::ReconstructedParticle::get(Photon0, ReconstructedParticles)")
df = df.Define("photons_p", "FCCAnalyses::ReconstructedParticle::get_p(photons_all)")
max_p = (df.Max("photons_p").GetValue())
# Filter the DataFrame to include only rows with the maximum photon momentum
df_max_photons = df.Filter("photons_p == max_p")
Welcome to the ROOT Forum!
Maybe Iām wrong, @vpadulan can correct me, but maybe you have to add:
df.Define("max_p ", max_p)
Dear @somebody.nobody ,
First, a caveat. By calling GetValue
and then perform new operations on the same histogram, you are triggering a computation graph run (i.e. an event loop) and then at some point you will call another, so you have more than one event loop over the same dataset. I believe in your case this is unavoidable, since you need to compute a quantity (the maximum pt) over all the events in the dataset first.
Back to your question, @bellenot is right in the sense that your Filter expression "photons_p == max_p"
supposes that the quantity max_p
is reachable by RDataFrame somehow, but you have given no way for the RDataFrame to know about it. The variable max_p
in your snippet is just a normal Python variable holding a value (in this case a float representing the maximum pt of your events). So this quantity needs to be declared to RDataFrame somehow. @bellenot is showing you one way, which is defining a new column in the dataset with this quantity. Another way which avoids repeating the same float for each event is as follows:
max_p = (df.Max("photons_p").GetValue())
df_max_photons = df.Filter(f"photons_p == {max_p}")
Where practically you are inserting the constant float value held by max_p
in the string expression, practically embedding it inside the expression that will be JIT-compiled by cling.
Cheers,
Vincenzo
2 Likes
I think the program may be running into an issue on the line max_p = (df.Max(āphotons_pā).GetValue()).
I get an extended error message, but the key problem seems to be here: /tmp/root/spack-stage/spack-stage-root-6.28.06-dgx5r6vwya5ynyeoef36d2aw6vk6v2jc/spack-build-dgx5r6v/include/ROOT/RDF/InterfaceUtils.hxx:313:4: error: static_assert failed due to requirement āstd::is_convertible<ROOT::VecOps::RVec, bool>::valueā āfilter expression returns a type that is not
convertible to boolā
The error talks about a filter expression
, so itās hard that the problem is at the Max
call. Do you have any other Filter
calls in your applications? Double check that the expressions you write in a Filter
return a boolean value, that is a requirement.
Cheers,
Vincenzo
I fixed the filter problem, but Iām running into another difficulty with the selection. Is there a way to get the particle with max. momentum for each event, rather than out of every particle in the data frame?
Dear @somebody.nobody ,
I see from your previous snippet that you already have a column named photons_p
which holds the values of the pt for all the particles in the event. Supposing that the elements of this column are of a type akin to std::vector
, you could do something like
df.Define(
"max_p_of_event",
"*(std::max_element(std::begin(photons_p), std::end(photons_p)))"
)
Note the extra de-reference *
which is needed since max_element
returns an iterator to the maximum element of the vector.
Cheers,
Vincenzo