hi,
i am using root 6.16.00 and python 2.7.15
i am trying to migrate a simple python analysis to rdataframe. however i don’t grasp how i can manipulate vectors of variables found in the original tree. in the original root tree say i have variables muon_phi and jet_phi, which are both vectors of doubles. if i wanted to calculate the smallest delta phi between muons and jets i would make two loops to loop over the elements of both vectors. but in rdataframe it’s not clear how to do that.
i can imagine using define to make a new custom column and function to define the column. but i don’t
see still how to access the variables i would need in that function.
Using a Define to create a new custom column is the right way to go. Basically, in python, when you write your Define string, “muon_phi” and “jet_phi” will be instances of RVec, that you can manipulate as you want – you can also loop over them:
You could do with these functions something similar to the workflow below (running code!):
import ROOT
# Open a ROOT file via xrootd with muon and electron collections
df = ROOT.RDataFrame(
"Events",
"root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/SMHiggsToZZTo4L.root")
# Find combination of electron and muon with minimum delta phi
df = df.Filter("nMuon > 0 && nElectron > 0", "Select only valid events")\
.Define("idx", "Combinations(Muon_phi, Electron_phi)")\
.Define("dphi", "DeltaPhi(Take(Muon_phi, idx[0]), Take(Electron_phi, idx[1]))")\
.Define("minIdx", "ArgMin(abs(dphi))")\
.Define("dphiMin", "dphi[minIdx]")
# Have a look at the values
print(df.AsNumpy(["dphiMin"]))
Aaaaand as follow-up: This way you can easily put in your own “kernels” (C++ free functions), see below:
import ROOT
# Open a ROOT file via xrootd with muon and electron collections
df = ROOT.RDataFrame(
"Events",
"root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/SMHiggsToZZTo4L.root")
# Make your custom thingy
ROOT.gInterpreter.Declare("""
double YourImplementation(const ROOT::RVec<double>& m, const ROOT::RVec<double>& e) {
double min = 999;
for(std::size_t i=0; i<m.size(); i++) {
for(std::size_t j=0; j<e.size(); j++) {
// Note that this is not a "correct" delta phi because of the boundary conditions!
const auto something = std::abs(m[i] - e[j]);
if (min > something) min = something;
}
}
return min;
};
""")
# Run the function in the event loop of RDataFrame
df = df.Filter("nMuon > 0 && nElectron > 0", "Select only valid events")\
.Define("dphiMin", "YourImplementation(Muon_phi, Electron_phi)")\
# Have a look at the values
print(df.AsNumpy(["dphiMin"]))
Following the suggestion from @swunsch, another way of injecting C++ code in the Python script would be to create a header with your custom C++ functions and then declare it to the ROOT C++ interpreter.
myheader.h
#ifndef myheader
#define myheader
double YourImplementation(const ROOT::RVec<double>& m, const ROOT::RVec<double>& e) {
double min = 999;
for(std::size_t i=0; i<m.size(); i++) {
for(std::size_t j=0; j<e.size(); j++) {
// Note that this is not a "correct" delta phi because of the boundary conditions!
const auto something = std::abs(m[i] - e[j]);
if (min > something) min = something;
}
}
return min;
};
#endif
python_script.py
import ROOT
ROOT.gInterpreter.Declare('#include "myheader.h"')
# Follow Stefan's example from now on
# Open a ROOT file via xrootd with muon and electron collections
df = ROOT.RDataFrame(
"Events",
"root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/SMHiggsToZZTo4L.root")
# Run the function in the event loop of RDataFrame
df = df.Filter("nMuon > 0 && nElectron > 0", "Select only valid events")\
.Define("dphiMin", "YourImplementation(Muon_phi, Electron_phi)")\
# Have a look at the values
print(df.AsNumpy(["dphiMin"]))