Getting photon with maximum momentum?

I am trying to access the photon with maximum momentum from a root data frame. Can someone explain why this command does not accomplish this? I am receiving errors that the filter can not be interpreted as a boolean.

    df = df.Alias("Photon0", "Photon#0.index")
    df = df.Define("photons_all", "FCCAnalyses::ReconstructedParticle::get(Photon0, ReconstructedParticles)")
    df = df.Define("photons_p", "FCCAnalyses::ReconstructedParticle::get_p(photons_all)")
    max_p = (df.Max("photons_p").GetValue())

    # Filter the DataFrame to include only rows with the maximum photon momentum
    df_max_photons = df.Filter("photons_p == max_p")

Welcome to the ROOT Forum!
Maybe Iā€™m wrong, @vpadulan can correct me, but maybe you have to add:

df.Define("max_p ", max_p)

Dear @somebody.nobody ,

First, a caveat. By calling GetValue and then perform new operations on the same histogram, you are triggering a computation graph run (i.e. an event loop) and then at some point you will call another, so you have more than one event loop over the same dataset. I believe in your case this is unavoidable, since you need to compute a quantity (the maximum pt) over all the events in the dataset first.

Back to your question, @bellenot is right in the sense that your Filter expression "photons_p == max_p" supposes that the quantity max_p is reachable by RDataFrame somehow, but you have given no way for the RDataFrame to know about it. The variable max_p in your snippet is just a normal Python variable holding a value (in this case a float representing the maximum pt of your events). So this quantity needs to be declared to RDataFrame somehow. @bellenot is showing you one way, which is defining a new column in the dataset with this quantity. Another way which avoids repeating the same float for each event is as follows:

max_p = (df.Max("photons_p").GetValue())

df_max_photons = df.Filter(f"photons_p == {max_p}")

Where practically you are inserting the constant float value held by max_p in the string expression, practically embedding it inside the expression that will be JIT-compiled by cling.

Cheers,
Vincenzo

2 Likes

I think the program may be running into an issue on the line max_p = (df.Max(ā€œphotons_pā€).GetValue()).
I get an extended error message, but the key problem seems to be here: /tmp/root/spack-stage/spack-stage-root-6.28.06-dgx5r6vwya5ynyeoef36d2aw6vk6v2jc/spack-build-dgx5r6v/include/ROOT/RDF/InterfaceUtils.hxx:313:4: error: static_assert failed due to requirement ā€˜std::is_convertible<ROOT::VecOps::RVec, bool>::valueā€™ ā€œfilter expression returns a type that is not
convertible to boolā€

The error talks about a filter expression, so itā€™s hard that the problem is at the Max call. Do you have any other Filter calls in your applications? Double check that the expressions you write in a Filter return a boolean value, that is a requirement.

Cheers,
Vincenzo

I fixed the filter problem, but Iā€™m running into another difficulty with the selection. Is there a way to get the particle with max. momentum for each event, rather than out of every particle in the data frame?

Dear @somebody.nobody ,

I see from your previous snippet that you already have a column named photons_p which holds the values of the pt for all the particles in the event. Supposing that the elements of this column are of a type akin to std::vector, you could do something like

df.Define(
    "max_p_of_event",
    "*(std::max_element(std::begin(photons_p), std::end(photons_p)))"
)

Note the extra de-reference * which is needed since max_element returns an iterator to the maximum element of the vector.

Cheers,
Vincenzo