Hi Everyone,
Sorry for asking a newbie question but I have just spent several hours trying to understand how to recognize when chain.Process skips a file because the file is unavailable (multiple copies of all files read by the job exist at various sites but sometimes a file can't be opened or can't be read correctly). If there is a thread somewhere answering this just point me at it. The files I am reading are in the ATLAS distributed XRootD data storage system (FAX). I am using an complicated ATLAS analysis code that I did not write so I am struggling to find the right place to look for evidence of a skipped file.
I normally code batch jobs in exactly the way that chain.Process seems to work by default: continue on at all costs to read as many events as possible rather than waste CPU time spent already spent in running the job when a read or file open fails part of the way through the input datastream. However I am testing a a system designed to automatically retry the job if any of the inputs are not read. I guess detecting that a file on the chain has been skipped is trivial but what I tried does not work:
baseElecChan->nWeightedAcceptedEvents = nWeightedAcc; // MeV; lower elec pt for testing
chain.SetNotify(baseElecChan);
if (chain.Process(baseElecChan,"",nevents,0) == -1) {
return 1;
}
I can see that nevents is set to a large number (1000000000) but I don’t know if this kBigNumber. When XRootD can’t access the file I get an error from somewhere but I figure out where. In the job I am looking at I see this message from somewhere but I can’t figure out where (various other message occur depending on why the selected copy of the file won’t open):
Error in TXNetFile::Init: root://fax.mwt2.org//atlas/rucio/data12 … 057.root.2 failed to read the file type data.
Error in TXNetFile::CreateXClient: open attempt failed on root://fax.mwt2.org//atlas/rucio/data12 … 057.root.2
Control never transfers to the notification function setup by the chain.Notify command like it does when the file can be read. The program just seems to quietly go onto the next file. One of the goals of the project is to minimize the number of reads sent out on the network, so this mean that no check of whether the file can actually be found and read is made before the chain.Process command.
Thanks greatly in advance for any advice.
Fred