I iterate through files using XRootD and TChain with
for (Long64_t ientry=0; ientry<nentries; ientry++) {
auto jentry = p.LoadTree(ientry);
if (jentry < 0 && jentry != -2) {
return jentry;
}
if (jentry < 0) {
cout << "Code for the last entry is " << ientry << endl;
break;
}
auto bytes = p.GetEntry(ientry);
if (bytes == 0) {
std::cerr << "[ERROR] Cannot read "
<< ientry << "th entry properly" << std::endl;
return -2;
}
I encountered several I/O error during reading files through XRootD.
It likes
Error in <TNetXNGFile::Open>: [ERROR] Server responded with an error: [3011] No servers are available to read the file.
I can exit my macro for this case using return value -3 of LoadTree.
But for the errors,
Error in <TNetXNGFile::ReadBuffers>: [ERROR] Server responded with an error: [3008] Unable to readv /path/to/file, cannot allocate memory.
Error in <TBranch::GetBasket>: File: root://xrootd.cmsaf.mit.edu//path/to/file
at byte:210465775, branch:nPV, entry:60543, badread=1, nerrors=1, basketnumber=9
Error in <TBasket::Streamer>: The value of fKeylen is incorrect (-9679) ; trying to recover by setting it to zero
Error in <TBasket::Streamer>: The value of fObjlen is incorrect (-89109809) ; trying to recover by setting it to zero
Error in <TBasket::Streamer>: The value of fNbytes is incorrect (-1603141829) ; trying to recover by setting it to zero
Error in <TBasket::TBasket::Streamer>: The value of fNevBufSize (-1997264740) or fIOBits (191) is incorrect ; setting the buffer to a zombie.
Is there a way to detect the error code and stop the macro, to avoid outputting bad entries?
Error in <TNetXNGFile::ReadBuffers>: [ERROR] Server responded with an error: [3008] Unable to readv /path/to/file, cannot allocate memory.
It could be either a server-side problem or a corrupted file. To exclude problem on the server side you would need to contact the owner of xrootd.cmsaf.mit.edu
I think so. I tested on lxplus, at interactive and non-interactive nodes. I can only encounter the error for the first type using a wrong filename. The error code works. After I transfer the compiled files to non-interactive nodes, both the first and second types appear. Sometimes the first type are detected but sometimes not. For the second type of error, returning code does not work. I made use of a auto-generated class file instead of a plain TChain. Does this matter? I manipulate the LoadTree using
Long64_t ParticleTree::LoadTree(Long64_t entry)
{
// Set the environment to read one entry
if (!fChain) return -5;
Long64_t centry = fChain->LoadTree(entry);
if (centry < 0) return centry;
if (fChain->GetTreeNumber() != fCurrent) {
fCurrent = fChain->GetTreeNumber();
Notify();
std::clog << "Successfully loaded tree " << fCurrent << std::endl;
}
return centry;
}
I have the log
Successfully loaded tree 0
Successfully loaded tree 1
Error in <TNetXNGFile::Open>: [ERROR] Server responded with an error: [3011] No servers are available to read the file.
The code for TMVAClassificationApp is -3
This time, it returns proper code to indicate the file are not opened.
I also have the following log:
Successfully loaded tree 0
Error in <TNetXNGFile::Open>: [ERROR] Server responded with an error: [3011] No servers are available to read the file.
Successfully loaded tree 2
The code for TMVAClassificationApp is 0
This means the program runs successfully with exit code 0. The error code was not detected.
For the second type of error, I have similar output – no errors were detected.
I am afraid I cannot understand the behavior of this function. The function will abort if abort_bool is true. I am not sure what boolean value will be passed as abort_bool after an error occurs.
The former one will result in abort as long as ERROR message is detected. The later one won’t. So I guess that error code larger than gErrorAbortLevel will result in aborting program. The default behavior of ROOT will abort if error code kFatal is detected.
As a workaround, setting
gErrorAbortLevel = kError
or call TObject::Fatal()
will abort the program as long as ERROR message appears. I make use of the former one because it will detect all error and abort. The drawback may be it is a global effect, which may bring some surprising results. This helps me out.