I bumped into an I think interesting question. I was trying to write a slightly more complicated functor, which I would then use in an RDataFrame::Define(...)
expression for adding a new column.
At first, I defined a number of operator()(...)
execution operators on my functor, all with slightly different arguments and return values. This just freaked PyROOT completely out. I was getting some very hard-to-understand crashes along the lines of:
xAOD::Init INFO Environment initialised for data access
ATE::MuonCalibrator INFO Initializing the muon calibrator object
In module 'ROOTDataFrame':
/cvmfs/atlas.cern.ch/repo/sw/software/24.2/AnalysisBaseExternals/24.2.36/InstallArea/x86_64-el9-gcc13-opt/include/ROOT/RDF/RInterface.hxx:331:14: error: cannot compile this scalar expression yet
return DefineImpl<F, RDFDetail::ExtraArgsForDefine::None>(name, std::move(expression), columns, "Define");
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*** Break *** segmentation violation
...
Traceback (most recent call last):
File "/home/krasznaa/ATLAS/tools/atlas-tool-example/build/x86_64-el9-gcc13-dbg/bin/AnalysisDemo_rdf.py", line 22, in <module>
muon_pt_xaod = df.Define('muon_pt_calib', muCalib, ['Muons'])
File "/cvmfs/atlas.cern.ch/repo/sw/software/24.2/AnalysisBaseExternals/24.2.36/InstallArea/x86_64-el9-gcc13-opt/lib/ROOT/_pythonization/_rdf_pyz.py", line 381, in _PyDefine
rdf_node = _handle_cpp_callables(func, rdf._OriginalDefine, col_name, func, cols)
File "/cvmfs/atlas.cern.ch/repo/sw/software/24.2/AnalysisBaseExternals/24.2.36/InstallArea/x86_64-el9-gcc13-opt/lib/ROOT/_pythonization/_rdf_pyz.py", line 282, in _handle_cpp_callables
return original_template[type(func)](*args)
cppyy.ll.SegmentationViolation: Could not instantiate Define<ATE::MuonCalibrator>:
ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Define(basic_string_view<char,char_traits<char> > name, ATE::MuonCalibrator expression, const vector<string>& columns = {}) =>
SegmentationViolation: segfault in C++; program state was reset
While trying to simplify the issue, I went to only defining a single execution operator on the type in question, which would look like:
std::vector<float> operator()(const xAOD::MuonContainer& muons,
const CP::SystematicSet& syst = {}) const;
The second, optional argument is not primarily there for RDF. I just want to see if a single type could be made such that it would be usable directly both by RDF and by some hand-written code as well.
But when I try to use this version of the code, I get:
xAOD::Init INFO Environment initialised for data access
ATE::MuonCalibrator INFO Initializing the muon calibrator object
Traceback (most recent call last):
File "/home/krasznaa/ATLAS/tools/atlas-tool-example/build/x86_64-el9-gcc13-dbg/bin/AnalysisDemo_rdf.py", line 22, in <module>
muon_pt_xaod = df.Define('muon_pt_calib', muCalib, ['Muons'])
File "/cvmfs/atlas.cern.ch/repo/sw/software/24.2/AnalysisBaseExternals/24.2.36/InstallArea/x86_64-el9-gcc13-opt/lib/ROOT/_pythonization/_rdf_pyz.py", line 381, in _PyDefine
rdf_node = _handle_cpp_callables(func, rdf._OriginalDefine, col_name, func, cols)
File "/cvmfs/atlas.cern.ch/repo/sw/software/24.2/AnalysisBaseExternals/24.2.36/InstallArea/x86_64-el9-gcc13-opt/lib/ROOT/_pythonization/_rdf_pyz.py", line 282, in _handle_cpp_callables
return original_template[type(func)](*args)
cppyy.gbl.std.runtime_error: Could not instantiate Define<ATE::MuonCalibrator>:
ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Define(basic_string_view<char,char_traits<char> > name, ATE::MuonCalibrator expression, const vector<string>& columns = {}) =>
runtime_error: 2 column names are required but 1 was provided: "Muons".
xAOD::TFileAccessTracer INFO Sending file access statistics to http://rucio-lb-prod.cern.ch:18762/traces/
Here the error is at least clear. That the code does not want to forego specifying the optional second argument for my functor. Which I now believe may have had something to do with the previous crashes that I observed. Since every operator in that version of the code also had an optional last argument. But with multiple operators to choose from, I guess the JIST code couldnât figure out what to do.
So, at the end of this very long story: Should it not be possible to use such a setup? With an operator that has one or more default arguments?
Cheers,
Attila
ROOT Version: 6.28/10
Platform: RHEL 9
Compiler: GCC 13