Hi!
I’m currently trying to create a column to a distributed RDF in Spark with values based on a class from a custom shared library, but the class needs some parameters from a file that’s in my EOS. Opening the file and using the class work perfectly fine when doing it in SWAN without the distributed part, and I can use either a relative path to the file or /eos/user/ path.
The problem comes when using ROOT.RDF.Experimental.Distributed.initialize
. The idea is to initialize the parameters for the class first to a global variable (jecl1
) that can then be used with the column function (L1_correction()
). Here’s an example of the code that’s initialized (classes such as JetCorrectorParameters
are distributed with df._headnode.backend.distribute_header()
and df._headnode.backend.distribute_shared_libraries()
where df
is the Spark RDF):
‘’’
def init():
dist_code = """
#ifndef CORRECTIONS_C
#define CORRECTIONS_C
JetCorrectorParameters *l1;
FactorizedJetCorrector *jecl1;
vector<JetCorrectorParameters> v1;
void initCorrections() {
const char *s1 = "/eos/user/n/ntoikka/SWAN_projects/corrections/Summer19UL18_V5_MC/Summer19UL18_V5_MC_L1FastJet_AK4PFchs.txt";
l1 = new JetCorrectorParameters(s1);
v1.push_back(*l1);
jecl1 = new FactorizedJetCorrector(v1);
}
// L1 corrections
ROOT::RVec<double> L1_correction(ROOT::RVec<double> pT, ROOT::RVec<double> eta, ROOT::RVec<double> area, double rho) {
ROOT::RVec<double> correction(pT.size());
for (unsigned int i = 0; i < pT.size(); i++) {
jecl1->setJetEta(eta[i]);
jecl1->setJetPt(pT[i]);
jecl1->setRho(rho);
jecl1->setJetA(area[i]);
correction[i] = jecl1->getCorrection();
}
return correction;
}
#endif
"""
ROOT.gInterpreter.Declare(dist_code)
ROOT.initCorrections()
initialize(init)
‘’’
Running this as a cell in SWAN works, but when doing df1 = df.Define("L1correction", "L1_correction(Jet_pt, Jet_eta, Jet_area, Jet_rho)")
and running it, the distributed executors can’t find the file s1
. The problem there is somewhat clear, in the variable s1
I should use the complete path with root://eosuser.cern.ch/
before the /eos/
part, but if I do so the cell doesn’t run locally in SWAN as the non-distributed part doesn’t recognize that path and cannot open the file. How can I either get SWAN to recognize the root://eosuser.cern.ch/
path or get the executors to use just the /eos/
path?
An alternative solution is to just move the code from the initCorrections()
inside the L1_correction()
and use the complete path, but this runs quite slow, as there’s a lot of unnecessary file openings.
Thanks
ROOT Version: 6.27
Platform: SWAN K8s
Compiler: gcc11