Dear experts,
I’m reviving a thread [1] from about a year ago about accessing RDataFrame column values inside a function without passing the branch as an argument. My goals are roughly the same but I think I’ve learned a bit more to convey my question clearer.
To summarize the objective of the original post, I work with the NanoAOD format on CMS and in particular, I’m working on a more generic tool to handle NanoAOD with RDataFrame that tracks the actions on the RDF without being a full proxy/wrapper to RDataFrame.
I also plan on providing common C++ functions that can handle standard algorithms for analyzers (scale factor look-ups, generator particle matching, etc). The idea is to have a standard library of scripts so analyzers aren’t all reproducing the same (coding) work. This means that there are a lot of branch names that are always predictable as well as functions that will (almost) always take the same input NanoAOD branches.
Taking the example of matching generator particles to a jet, this requires a function of > 10 arguments if I have to input all of the different needed branch names. This puts the onus on the end user to write out the full C++ function with all 10+ arguments (in order) for their Define or Filter call in python. Since the point of python (and the tool I’m building) is to make life simpler, this is undesirable
What I would like to do is something like the following (simple example):
rdf = RDataFrame(...)
ROOT.gInterpreter('custom.cc')
rdf.Define('myVar','myFunc()')
where myFunc()
is defined in a custom.cc
. The custom.cc
file is:
float myFunc() {
return FatJet_pt[0];
}
where FatJet_pt
is an RVec<float>
in the RDataFrame. The question is, how can I define FatJet_pt
before myFunc()
so that it compiles but so that it can also access the value in the RDataFrame. It’s fine to use FatJet_pt
in the argument to Define
so it must be booked somewhere in memory. How can I point to it inside of myFunc()
so that the value updates once RDataFrame moves onto the next row/event?
I’ve tried doing the following before passing custom.cc
to gInterpreter:
for cname in BaseDataFrame.GetColumnNames():
ROOT.gInterpreter('%s %s;'%(BaseDataFrame.GetColumnType(cname), cname))
This will compile custom.cc
but eventually seg fault. I also tried a variation where I prepend extern
to each declaration but I get linking errors (this was a shot in the dark based on some skimming of StackOverflow). I’ve also added these declarations (with and without extern
) to a columns.h
and included this in custom.cc
with similar results.
Any input would be greatly appreciated.
Thanks!
Lucas
[1] - Access RDataFrame column in function without passing argument
Please read tips for efficient and successful posting and posting code
ROOT Version: v6.20.04
Platform: Linux(Ubuntu)
Compiler: cling