Dear Enrico,
Thank you for the breakdown of the runtime. I’m trying to rewrite the code using the DefinePerSample
, but I’ve run into some problems implementing it in python
.
As the documentation and this helpful thread suggest, I need to pass the function used in DefinePerSample()
some C++
objects that contain information on the pairs of "sample identifier: sample weight"
. I’m using std::vector<std::string>
and std::vector<std::double>
as in the thread. I’ve declared this function
ROOT.gInterpreter.Declare('''
float GetSampleWeight(unsigned int slot, const ROOT::RDF::RSampleInfo &id, std::vector<std::string> filePathVector, std::vector<double> weightVector) {
for (unsigned int i = 0; i < filePathVector.size(); i++) {
if (id.Contains(filePathVector[i])) {
return weightVector[i];
}
}
return -1.;
}
''')
to later use it in the RDataFrame as
df = df.DefinePerSample("sampleWeight", "GetSampleWeight(rdfslot_, rdfsampleinfo_, filePathVector, weightVector)")
but the question is how exactly do I get the filePathVector
and weightVector
to pass it to this function?
I’ve tried creating strings like this and feeding them to ROOT.gInterpreter.Declare()
ROOT.gInterpreter.Declare('std::vector<std::string> filePathVector {"file1.root", "file2.root", };')
ROOT.gInterpreter.Declare('std::vector<double> weightVector {1, 2, };')
But when I run my program over processes (which all result in a new filePathVector
and weightVector
) in a loop I get a C++
error about std::vector<std::string> filePathVector
and std::vector<double>
redefinition.
The reproducer and the input files files can be found here.
So my question is how to properly use the DefinePerSample
when working in python
?
Best regards,
Aleksandr