Dear experts,
I’m reviving a thread [1] from about a year ago about accessing RDataFrame column values inside a function without passing the branch as an argument. My goals are roughly the same but I think I’ve learned a bit more to convey my question clearer.
To summarize the objective of the original post, I work with the NanoAOD format on CMS and in particular, I’m working on a more generic tool to handle NanoAOD with RDataFrame that tracks the actions on the RDF without being a full proxy/wrapper to RDataFrame.
I also plan on providing common C++ functions that can handle standard algorithms for analyzers (scale factor look-ups, generator particle matching, etc). The idea is to have a standard library of scripts so analyzers aren’t all reproducing the same (coding) work. This means that there are a lot of branch names that are always predictable as well as functions that will (almost) always take the same input NanoAOD branches.
Taking the example of matching generator particles to a jet, this requires a function of > 10 arguments if I have to input all of the different needed branch names. This puts the onus on the end user to write out the full C++ function with all 10+ arguments (in order) for their Define or Filter call in python. Since the point of python (and the tool I’m building) is to make life simpler, this is undesirable 
What I would like to do is something like the following (simple example):
rdf = RDataFrame(...)
ROOT.gInterpreter('custom.cc')
rdf.Define('myVar','myFunc()')
where myFunc() is defined in a custom.cc. The custom.cc file is:
float myFunc() {
return FatJet_pt[0];
}
where FatJet_pt is an RVec<float> in the RDataFrame. The question is, how can I define FatJet_pt before myFunc() so that it compiles but so that it can also access the value in the RDataFrame. It’s fine to use FatJet_pt in the argument to Define so it must be booked somewhere in memory. How can I point to it inside of myFunc() so that the value updates once RDataFrame moves onto the next row/event?
I’ve tried doing the following before passing custom.cc to gInterpreter:
for cname in BaseDataFrame.GetColumnNames():
ROOT.gInterpreter('%s %s;'%(BaseDataFrame.GetColumnType(cname), cname))
This will compile custom.cc but eventually seg fault. I also tried a variation where I prepend extern to each declaration but I get linking errors (this was a shot in the dark based on some skimming of StackOverflow). I’ve also added these declarations (with and without extern) to a columns.h and included this in custom.cc with similar results.
Any input would be greatly appreciated.
Thanks!
Lucas
[1] - Access RDataFrame column in function without passing argument
Please read tips for efficient and successful posting and posting code
ROOT Version: v6.20.04
Platform: Linux(Ubuntu)
Compiler: cling