How to detect XrootD error and protect the output

Dear experts,

I iterate through files using XRootD and TChain with

  for (Long64_t ientry=0; ientry<nentries; ientry++) {
    auto jentry =  p.LoadTree(ientry);
    if (jentry < 0 && jentry != -2) {
      return jentry;
    }
    if (jentry < 0) {
      cout << "Code for the last entry is " << ientry << endl;
      break;
    }
    auto bytes = p.GetEntry(ientry);
    if (bytes == 0) {
      std::cerr << "[ERROR] Cannot read "
                << ientry << "th entry properly" << std::endl;
      return -2;
    }

I encountered several I/O error during reading files through XRootD.

It likes

Error in <TNetXNGFile::Open>: [ERROR] Server responded with an error: [3011] No servers are available to read the file.

I can exit my macro for this case using return value -3 of LoadTree.

But for the errors,

Error in <TNetXNGFile::ReadBuffers>: [ERROR] Server responded with an error: [3008] Unable to readv /path/to/file, cannot allocate memory.
Error in <TBranch::GetBasket>: File: root://xrootd.cmsaf.mit.edu//path/to/file
at byte:210465775, branch:nPV, entry:60543, badread=1,   nerrors=1, basketnumber=9
 Error in <TBasket::Streamer>: The value of fKeylen is incorrect (-9679) ; trying to recover by setting it to zero
 Error in <TBasket::Streamer>: The value of fObjlen is incorrect (-89109809) ; trying to recover by setting it to zero
 Error in <TBasket::Streamer>: The value of fNbytes is incorrect (-1603141829) ; trying to recover by setting it to zero
 Error in <TBasket::TBasket::Streamer>: The value of fNevBufSize (-1997264740) or fIOBits (191) is incorrect ; setting the buffer to a zombie.

Is there a way to detect the error code and stop the macro, to avoid outputting bad entries?

Many thanks in advance!


Please read tips for efficient and successful posting and posting code

ROOT Version: 6.22
Platform: LCG_99
Compiler: gcc10


May be @pcanal can help.

Any thoughts related to this?

Sometimes the function LoadTree does even not work either.

[quote=“Y.S.Zhang, post:1, topic:47695”]

Error in <TNetXNGFile::ReadBuffers>: [ERROR] Server responded with an error: [3008] Unable to readv /path/to/file, cannot allocate memory.

It could be either a server-side problem or a corrupted file. To exclude problem on the server side you would need to contact the owner of xrootd.cmsaf.mit.edu

So there is not way to skip corrupt files using TChain?

Are you already checking the return value of LoadTree and GetEntry and skipping those that have returned an error code?

I think so. I tested on lxplus, at interactive and non-interactive nodes. I can only encounter the error for the first type using a wrong filename. The error code works. After I transfer the compiled files to non-interactive nodes, both the first and second types appear. Sometimes the first type are detected but sometimes not. For the second type of error, returning code does not work. I made use of a auto-generated class file instead of a plain TChain. Does this matter? I manipulate the LoadTree using

Long64_t ParticleTree::LoadTree(Long64_t entry)
{
  // Set the environment to read one entry
  if (!fChain) return -5;
  Long64_t centry = fChain->LoadTree(entry);
  if (centry < 0) return centry;
  if (fChain->GetTreeNumber() != fCurrent) {
    fCurrent = fChain->GetTreeNumber();
    Notify();
    std::clog << "Successfully loaded tree " << fCurrent << std::endl;
  }
  return centry;
}

I have the log

 Successfully loaded tree 0
 Successfully loaded tree 1
 Error in <TNetXNGFile::Open>: [ERROR] Server responded with an error: [3011] No servers are available to read the file.

 The code for TMVAClassificationApp is -3

This time, it returns proper code to indicate the file are not opened.

I also have the following log:

 Successfully loaded tree 0
 Error in <TNetXNGFile::Open>: [ERROR] Server responded with an error: [3011] No servers are available to read the file.
 Successfully loaded tree 2

 The code for TMVAClassificationApp is 0

This means the program runs successfully with exit code 0. The error code was not detected.

For the second type of error, I have similar output – no errors were detected.

In that case, you will need to use your own error message handler (to be able to check the message and do the ‘right’ thing).

  gROOT->SetErrorHandler(CustomErrorHandler);

You can see the default version of this function at: ROOT: core/base/src/TErrorDefaultHandler.cxx Source File

Many thanks for this illustration!

I am afraid I cannot understand the behavior of this function. The function will abort if abort_bool is true. I am not sure what boolean value will be passed as abort_bool after an error occurs.

However, I found the variable gErrorAbortLevel, after tracing the link you posted back, in file TError.h and TError.cxx. I do not understand this variable very well. There are several options, see https://root.cern.ch/doc/master//TError_8h.html. I guess that the values larger than kError are related to the segmentation faults. The default value for gErrorAbortLevel seems to be the kSysError+1, https://root.cern/doc/master/TError_8cxx.html#a070eef9f94195b433ed24fe5ff84bb27.

I manually set

gErrorAbortLevel = kError

and

gErrorAbortLevel = kError+1

The former one will result in abort as long as ERROR message is detected. The later one won’t. So I guess that error code larger than gErrorAbortLevel will result in aborting program. The default behavior of ROOT will abort if error code kFatal is detected.

As a workaround, setting

gErrorAbortLevel = kError

or call TObject::Fatal()

will abort the program as long as ERROR message appears. I make use of the former one because it will detect all error and abort. The drawback may be it is a global effect, which may bring some surprising results. This helps me out.