Manipulating vector variables with rdataframe

hi,
i am using root 6.16.00 and python 2.7.15
i am trying to migrate a simple python analysis to rdataframe. however i don’t grasp how i can manipulate vectors of variables found in the original tree. in the original root tree say i have variables muon_phi and jet_phi, which are both vectors of doubles. if i wanted to calculate the smallest delta phi between muons and jets i would make two loops to loop over the elements of both vectors. but in rdataframe it’s not clear how to do that.

i can imagine using define to make a new custom column and function to define the column. but i don’t
see still how to access the variables i would need in that function.

thanks for any help. ,kj

Hi @kjohns,
this tutorial and this one might help.

Using a Define to create a new custom column is the right way to go. Basically, in python, when you write your Define string, “muon_phi” and “jet_phi” will be instances of RVec, that you can manipulate as you want – you can also loop over them:

smalles_phi = "auto n_muon = muon_phi.size(); auto n_jet = jet_phi.size(); for (...) { ... }; return min_phi;"
df = df.Define("smallest_deltaphi", smallest_phi);

The weird part is that currently the expression must be valid C++, not python.

Hope this helps,
Enrico

Hi!

For completeness: We have integrated most of the functionality to make this nicer, but unfortunately not yet in 6.16 but in 6.18. See the physics helpers here: https://root.cern.ch/doc/master/vo007__PhysicsHelpers_8C.html

You could do with these functions something similar to the workflow below (running code!):

import ROOT

# Open a ROOT file via xrootd with muon and electron collections
df = ROOT.RDataFrame(
        "Events",
        "root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/SMHiggsToZZTo4L.root")

# Find combination of electron and muon with minimum delta phi
df = df.Filter("nMuon > 0 && nElectron > 0", "Select only valid events")\
       .Define("idx", "Combinations(Muon_phi, Electron_phi)")\
       .Define("dphi", "DeltaPhi(Take(Muon_phi, idx[0]), Take(Electron_phi, idx[1]))")\
       .Define("minIdx", "ArgMin(abs(dphi))")\
       .Define("dphiMin", "dphi[minIdx]")

# Have a look at the values
print(df.AsNumpy(["dphiMin"]))

Best
Stefan

1 Like

Aaaaand as follow-up: This way you can easily put in your own “kernels” (C++ free functions), see below:

import ROOT

# Open a ROOT file via xrootd with muon and electron collections
df = ROOT.RDataFrame(
        "Events",
        "root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/SMHiggsToZZTo4L.root")

# Make your custom thingy
ROOT.gInterpreter.Declare("""
double YourImplementation(const ROOT::RVec<double>& m, const ROOT::RVec<double>& e) {
    double min = 999;
    for(std::size_t i=0; i<m.size(); i++) {
        for(std::size_t j=0; j<e.size(); j++) {
            // Note that this is not a "correct" delta phi because of the boundary conditions!
            const auto something = std::abs(m[i] - e[j]);
            if (min > something) min = something;
        }
    }
    return min;
};
""")

# Run the function in the event loop of RDataFrame
df = df.Filter("nMuon > 0 && nElectron > 0", "Select only valid events")\
       .Define("dphiMin", "YourImplementation(Muon_phi, Electron_phi)")\

# Have a look at the values
print(df.AsNumpy(["dphiMin"]))
1 Like

Following the suggestion from @swunsch, another way of injecting C++ code in the Python script would be to create a header with your custom C++ functions and then declare it to the ROOT C++ interpreter.

myheader.h

#ifndef myheader
#define myheader

double YourImplementation(const ROOT::RVec<double>& m, const ROOT::RVec<double>& e) {
    double min = 999;
    for(std::size_t i=0; i<m.size(); i++) {
        for(std::size_t j=0; j<e.size(); j++) {
            // Note that this is not a "correct" delta phi because of the boundary conditions!
            const auto something = std::abs(m[i] - e[j]);
            if (min > something) min = something;
        }
    }
    return min;
};

#endif

python_script.py

import ROOT

ROOT.gInterpreter.Declare('#include "myheader.h"')

# Follow Stefan's example from now on

# Open a ROOT file via xrootd with muon and electron collections
df = ROOT.RDataFrame(
        "Events",
        "root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/SMHiggsToZZTo4L.root")

# Run the function in the event loop of RDataFrame
df = df.Filter("nMuon > 0 && nElectron > 0", "Select only valid events")\
       .Define("dphiMin", "YourImplementation(Muon_phi, Electron_phi)")\

# Have a look at the values
print(df.AsNumpy(["dphiMin"]))

Nice! That reminds me of this tutorial:

There all of these things are nicely explained in notebooks.

1 Like

thank you much for the prompt good examples. i will proceed to give it them a try.