R__unzip: error in header

Dear,

I am trying to analyses a bunch of rootuple file with our analyses software. Some of this file seems to be corrupted but, I did not find a way to detect it. Each time my analyses encounter such a file, it crashes. I could simply removed them from the hard drive, but data are coming each 30 min and I would like an automatic procedure to detect those file.

Here is the procedure I am using to add the file to be analysed to a TChain object :

[code][…]
TChain* chain = new TChain();
TFile *File;
char PATH[1054];
DIR *dp;
struct dirent *dirp;

//1- Load a list of File
if(Param.InputFile.size() > 1 )
	for(int i=0; i<Param.InputFile.size(); i++)
	{
		//CHECK THE FILE ISN'T CORRUPTED
		File = TFile::Open (Param.InputFile[i]);
		if (File == 0x0)
		{
			printf("\033[43;31m!!!! ERROR !!!!\033[0m\n      Bad File %s Access\n\n",Param.InputFile[i]);
			continue;
		}

		if (File->IsZombie())
		{
			printf("\033[43;31m!!!! ERROR !!!!\033[0m\n      File %s is zombie\n\n",Param.InputFile[i]);
			File->Close();
			continue;
		}
		
		File->Close();
		if(chain->Add(Param.InputFile[i]) <= 0)
		{
			printf("\033[43;31m!!!! ERROR !!!!\033[0m\n    Can not add file %s to chain.\n",Param.InputFile[i]);
			continue;
		}
		
		printf("    Add file [%i] \033[22;34m%s\033[0m to chain.\n",i+1,Param.InputFile[i]);
	}

}

chain->TChain::Process(Param.Script,Param.OPTIONS);
return 1;
}

[/code]

In the above code, Param is a class describing the input parameters of my analyses software. InputFile is a array of char which list all file I whish to analyse, Script is the analyses script and OPTIONS are the option required in my script.

For some file, I get the following error :

R__unzip: error -3 in inflate (zlib) TrRecon::SetParFromDataCards-I-TrackThrSeed= 4 4 R__unzip: error in header TrPdfDB::Load-W no TrPdfDB in file. InitDB Init done 0 R__unzip: error in header AMSEventR::ReadHeader-I-Version/OS 526/12 root://ccxroot:XXXX//hpss/in2p3.fr/XXXXXXXXXXXXXXXXXXXXXXX/1309758010.00000001.root AMSEventR::ReadHeader-I-NewRun 1309758010 AMSEventR::UpdateSetup-E-UnabletofindSetupEntryfor 1309758010 Error in <TBranchElement::GetBasket>: File: root://ccxroot:XXXX//hpss/in2p3.fr/XXXXXXXXXXXXXXXXXXXXXXX/1309758010.00000001.root at byte:4126432, branch:ev.fHeader, entry:466, badread=1, nerrors=1, basketnumber=1 Error in <TBranchElement::GetBasket>: File: root://ccxroot:XXXX//hpss/XXXXXXXXXXXXXXXXXXXXXXX/1309758010.00000001.root at byte:0, branch:ev.fHeader, entry:467, badread=1, nerrors=2, basketnumber=1 ..... Error in <TBranchElement::GetBasket>: File: root://ccxroot:XXXX//hpss/in2p3.fr/XXXXXXXXXXXXXXXXXXXXXXX/1309758010.00000001.root at byte:0, branch:ev.fHeader, entry:475, badread=1, nerrors=10, basketnumber=1
Whish lead to a memory leaks and crashes of the analyses when the memory exceed the allowed memory.
I tryed several way to detect such corrupted file but did not manage and I am now asking the help of other people which encouter the same problem.

Note that this file can be opened with root

root oot://ccxroot:XXXX//hpss/XXXXXXXXXXXXXXXXXXXXXXX/1309758010.00000001.root

Their is no message informing me that the file is corrupted or has been recovered while it was opened by root.
I can read the Tree with the TBrowser and plot some histogram. But, I can not run my analyses on this file. So, I wonder the problem is coming from 1 single evenment but I really do not want to loop over all event to check the validity of the file. How can I solve this.

Hi,

Unfortunately, short of scanning the whole there is no easy way to spot corruption within the file itself. When calling GetEntry for the various branches you can check the return value to detect such errors.

Cheers,
Philippe.

[quote=“pcanal”]Hi,

Unfortunately, short of scanning the whole there is no easy way to spot corruption within the file itself. When calling GetEntry for the various branches you can check the return value to detect such errors.

Cheers,
Philippe.[/quote]

Hi,

Thanks for the response. And how can I do this. Apparently from the error message, the number of entry of the branch ev.fHeader, which gives trouble, seems not to be empty. It number of entry is approx. 467. If I check the branch and get the number of entry before adding the file to the TChain, I will not get error. Right ?

Hi,

No. With your Selector (i.e. in Param.Script), in the Process routine, you must be calling ‘GetEntry(…)’ either on the chain or on the branches. If an error happens, this routine should be returning a negative number (otherwise it returns the number of bytes read from the buffer). (this is not to be confused with GetEntries that tell you how many rows/entries were stored in the branch).

Cheers,
Philippe.

Sorry for the late response.

I was in holidays and I just came back to office to test your suggestions.
Unfortunatly, your solution does not solve my problem. I test all branch in my selection cuts but, entry are positive but I still have badread for some event.

I suspect that The problem I encounter is very close to the bug which is reported in this message :
http://permalink.gmane.org/gmane.comp.lang.c%2B%2B.root/12440

It seems that I have a corrupted basket which increase memory usage when loading and finaly results in memory exceed with cach associated to badaloc message. This seems to be solved in the latest root version (I am using root 5.27). Unfortunatly, our collaboration software are compiled with a “home made” patched version of root 5.27 and I not able to compile them using the latest version of root.

Is their any way o test basket to exclude file which such a problem ?

In my main program, before adding the file to the TChain, I test the Basket as follow :

[code][…]
TChain* chain = new TChain();
TFile *File;
char PATH[1054];
DIR *dp;
struct dirent *dirp;

//1- Load a list of File
if(Param.InputFile.size() > 1 )
for(int i=0; i<Param.InputFile.size(); i++)
{
//CHECK THE FILE ISN’T CORRUPTED
File = TFile::Open (Param.InputFile[i]);
if (File == 0x0)
{
printf("\033[43;31m!!! ERROR !!!\033[0m\n Bad File %s Access\n\n",Param.InputFile[i]);
continue;
}

     if (File->IsZombie())
     {
        printf("\033[43;31m!!!! ERROR !!!!\033[0m\n      File %s is zombie\n\n",Param.InputFile[i]);
        File->Close();
        continue;
     }

     TTree *Tree = (TTree*) File->Get("TreeName");
     if(Tree->LoadBaskets() <= 0)
    {
        printf("\033[43;31m!!!! ERROR !!!!\033[0m\n      File %s is zombie\n\n",Param.InputFile[i]);
        File->Close();
        continue;
     }
     
     File->Close();
     if(chain->Add(Param.InputFile[i]) <= 0)
     {
        printf("\033[43;31m!!!! ERROR !!!!\033[0m\n    Can not add file %s to chain.\n",Param.InputFile[i]);
        continue;
     }
     
     printf("    Add file [%i] \033[22;34m%s\033[0m to chain.\n",i+1,Param.InputFile[i]);
  }

}

chain->TChain::Process(Param.Script,Param.OPTIONS);
return 1;
}[/code]

This increase the computation time, but it does not find any bad read in the File. I also tested each entry of the file without finding any errors.

So, I have an other question :
Why does this file (and I have other file with the same problem) seems to be corrupted when I am using a TSelector while it appears fine when I open it with as a TFile ?

Hi,

[quote]I suspect that The problem I encounter is very close to the bug which is reported in this message :
root.cern.ch/viewvc?view=rev&revision=37985) is not in v5.27/06.

[quote]Why does this file (and I have other file with the same problem) seems to be corrupted when I am using a TSelector while it appears fine when I open it with as a TFile ?[/quote]Well, TSelector does not open the file per se. Either it is opened by the user directly (and Process is called on the result object) or it is opened via TChain object (on which Process is called). I suspect the difference might be in the set of branches that are being read. Otherwise the whole problem must be due to an unrelated memory access error (see the result of valgrind to pinpoint the issue if we are in this case).

[quote]Is their any way o test basket to exclude file which such a problem ?[/quote]Not really in v5.27/06.

[quote] Unfortunatly, our collaboration software are compiled with a “home made” patched version of root 5.27 and I not able to compile them using the latest version of root.[/quote]I suppose you might be able to get your collaboration to add the patch 37985 to the list of home patch in order to solve this problem.

Cheers,
Philippe.

Hi,

Thanks for this information. I will suggest our collaboration to include this Path.

With additional study of this file, I find that the problem might not be due to the Tree but might come from a TDirectory which I suspect is openned for each event. Bellow is the basic structure of the file :

TFile | |___ TDirectory datacards | |___ TTree Event |_ List of Object
When I try to open the TDirectory in interactive root session, root turn out to be very slow when I am looking at the “corrupted” file while it is very quick, almost instantaneous for a “normal” file. I might be able to exclude the “corrupted” file by testing the TDirectory before adding the file to the TChain. Is their any way to test the validity of a TDirectory ?

Hi,

[quote]Is their any way to test the validity of a TDirectory ?[/quote]Not really without loading it.

[quote]When I try to open the TDirectory in interactive root session, root turn out to be very slow when I am looking at the “corrupted” file while it is very quick, almost instantaneous for a “normal” file. [/quote]This ‘sounds’ like a coincidence. Nonetheless I would be interested in seeing both a normal file and a corrupted file to see the actual difference.

Cheers,
Philippe.

That is pretty annoying because it takes to root +30 min to realize the TDirectory is corrupted before sending the TSystem::Errno 115 (Operation in progress). I tried to add a TTimer to interrupt the process of opening the TDirectory after 1 min, but I did not rally manage.

This is unfortunately not possible. In one single file their is already lot of scientific data, therefore I am not allowed to share any file.

Hi,

For both the broken and the good file, could you send me the result ofmyfile->ls();and one stack trace during the loading of the directory.

Cheers,
Philippe.