N.B. I corrected all the paths below from …/scratch0/… to …/scratch0/public/… (sorry for the typo).
HI,
(Sorry for the long post, but I have tried to describe my problem as completely as possible.)
I have a problem running a job on LXBatch (batch system at CERN). The analysis job consists of C++ code (NLOJet++ program) compiled against ROOT 5.21.04.
When I run interactively on an lxplus node, it is fine.
However when I run the same code on the batch system, the code seems run okay, but for some reason I am not able to write the output to a TFile. I get the errors:
which can be seen in my stdout and stderr logs:
lxplus.cern.ch:~efeng/scratch0/public/ROOTprogram/logs/
(For some reason the errors seem to occur out of order in the log file, i.e. the ROOT errors are reported after some shell errors when I try to rename/copy the missing file. I guess this is a separate I/O issue.)
After that, there is no output ROOT file which is supposed to appear as:
NLOJet++Moriond/output/DijetMassChi.root
I put my code itself which writes to the TFile here:
lxplus.cern.ch:~efeng/scratch0/public/ROOTprogram/NLOJet++MoriondProgram/src/DijetMassChi.cpp
where the code that writes the output TFile is the function save():
void histos::save(TString filename){
TFile myfile(filename,"UPDATE");
//calculate cross-section from weights info.
ref_cross->Write("",TObject::kOverwrite);
ref_cross_scale->Write("",TObject::kOverwrite);
ref_obs_bins->Write("",TObject::kOverwrite);
ref_obs_bins_sq->Write("",TObject::kOverwrite);
myfile.Close();
}
Again when I run interactively on lxplus, the output file is saved fine without these errors. The problem only occurs on the batch queue.
For completeness, the setup file is:
lxplus.cern.ch:~efeng/scratch0/public/ROOTprogram/NLOJet++MoriondProgram/setup.sh
where I set my environment to:
plat=slc4_amd64_gcc34
export PATH=$PATH:/afs/cern.ch/sw/lcg/external/root/5.21.04/${plat}/root/bin/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/afs/cern.ch/sw/lcg/external/root/5.21.04/${plat}/root/lib
although actually the gcc version is 4.1.2 (same on both lxplus and lxbatch).
The Makefile is:
lxplus.cern.ch:~efeng/scratch0/public/ROOTprogram/NLOJet++MoriondProgram/Makefile
Each time I run a batch job, I copy the directory with my source code and rebuild:
#----- Setup NLOJet++
scp -r lxplus:~/NLOJet++Moriond/ ./
cd NLOJet++Moriond/
source setup.sh
make clean
make -B DijetMassChi.la # Need to make unconditionally.
rm -f output/*.root # Delete existing output (will be updated)
which you can see in my script:
lxplus.cern.ch:~efeng/scratch0/public/ROOTprogram/NLOJet++MoriondScripts/NLOJet++Moriond.run.sh
The rest of the batch submit scripts are here:
lxplus.cern.ch:~efeng/scratch0/public/ROOTprogram/NLOJet++MoriondScripts/
where in particular I execute the first, and then one script calls the next:
lxplus.cern.ch:~efeng/scratch0/public/ROOTprogram/NLOJet++MoriondScripts/NLOJet++Moriond.wrap.sh
lxplus.cern.ch:~efeng/scratch0/public/ROOTprogram/NLOJet++MoriondScripts/NLOJet++Moriond.lsf.sh
lxplus.cern.ch:~efeng/scratch0/public/ROOTprogram/NLOJet++MoriondScripts/NLOJet++Moriond.run.sh
Finally for completeness the NLOJet++ program itself is:
lxplus.cern.ch:~efeng/scratch0/public/ROOTprogram/NLOJet++MoriondProgram/
In the above, local directories I have for the C++ code and for the batch scripts are both actually called NLOJet++Moriond (in different paths), so I renamed them NLOJet++MoriondProgram/ and NLOJet++MoriondScripts/ when providing them in my scratch area for you.
I would be very grateful for any suggestions to understand why this problem occurring, and more importantly how to fix it.
Thanks,
Eric