Problem importing binary data file as tree on WinXP

Dear ROOTers

My program is able to import data files available either as text file or as two different kinds of binary files.
It can be compiled for Mac OS X, Linux (FC4) and WinXP (VC++). My problem is that on WinXP I am not able to
read the binary files, although everything is ok on both FC4 and OS X.

I know that this is in principle not a root-specific problem, but I have already spent some weeks trying to
find the problem w/o any success. Thus I would greatly appreciate if somone could find some time to help me,
and I thank you already in advance.

Since it is not possible to provide a simple macro, I am attaching a stripped-down version of my program
together with all necessary files, and a macro “macroImportData.C” which can be loaded into root.

The first problem is the import of simple binary files (“XDA”), as the following steps will show:

// load macro
.L macroImportData.C 
Init() 

// import/export text files
ImportDataTest3("DataTestAB")
ExportDataset("SchemeTest3.root","DataTestAB_cel.root","DataSet/*.cel","*","DataTestAB_cel.txt")

// import/export binary XDA files
ImportXDATest3("DataXDATestAB")
ExportDataset("SchemeTest3.root","DataXDATestAB_cel.root","DataSet/*.cel","*","DataXDATestAB_cel.txt")

As the exported text files will show, on WinXP the exported file “DataXDATestAB_cel.txt” will initially
export the correct data, then suddenly a very small number followed by zero only, as shown in the attached
example “DataXDATestAB_cel.txt”.

The relevant method “XGeneChipHyb::ReadXDAData” in the source file XPSData.cxx looks as follows:

typedef struct CELEntryType
{
	float Intensity;
	float Stdv;
	short Pixels;
} CELEntry;

Int_t XGeneChipHyb::ReadXDAData(ifstream &input, ...)
{
   TTree *datatree = new TTree(fDataTreeName, fSchemeName);
   XGCCell *cell = new XGCCell();
   datatree->Branch("DataBranch", "XGCCell", &cell, 64000, split);

   CELEntry *entries = new CELEntry[fNCells];
   for (Int_t i=0; i<fNCells; i++) {
      CELEntry *entry = entries + sizeof(entries);

      READ_FLOAT(input, entry->Intensity);
      READ_FLOAT(input, entry->Stdv);
      READ_SHORT(input, entry->Pixels);

      inten  = (Double_t)entry->Intensity;
      stdev  = (Double_t)entry->Stdv;
      numpix = entry->Pixels;

      cell->SetIntensity(inten);
      cell->SetStdev(stdev);
      cell->SetNumPixels(numpix);
      datatree->Fill(); 
   }//for_i
}

The important functions, which may cause the problem, are defined in XPSUtils.cxx:

inline UShort_t Swap16Bit(UShort_t x) {
  return ((((x) >> 8) & 0xff) | (((x) & 0xff) << 8));
}//Swap16Bit

inline UInt_t Swap32Bit(UInt_t x) {
   return ((((x) & 0xff000000) >> 24) | (((x) & 0x00ff0000) >>  8) |
           (((x) & 0x0000ff00) <<  8) | (((x) & 0x000000ff) << 24));
}//Swap32Bit

void READ_INT(std::ifstream &input, Int_t &value, Bool_t isBE)
{
   READ_UINT(input, (UInt_t&)value, isBE);
}//READ_INT

void READ_UINT(std::ifstream &input, UInt_t &value, Bool_t isBE)
{
   UInt_t val = 0;
   input.read((char*)&val, sizeof(val));

   Bool_t swap = kFALSE;
#ifdef IS_BIG_ENDIAN
   swap = kTRUE;
   if (isBE == kTRUE) swap = kFALSE;
#else
   if (isBE == kTRUE) swap = kTRUE;
#endif
   if (swap) value = Swap32Bit(val);
   else      value = val;
}//READ_UINT

void READ_SHORT(std::ifstream &input, Short_t &value, Bool_t isBE)
{
   READ_USHORT(input, (UShort_t&)value, isBE);
}//READ_SHORT

void READ_USHORT(std::ifstream &input, UShort_t &value, Bool_t isBE)
{
   UShort_t val = 0;
   input.read((char*)&val, sizeof(val));

   Bool_t swap = kFALSE;
#ifdef IS_BIG_ENDIAN
   swap = kTRUE;
   if (isBE == kTRUE) swap = kFALSE;
#else
   if (isBE == kTRUE) swap = kTRUE;
#endif
   if (swap) value = Swap16Bit(val);
   else      value = val;
}//READ_USHORT

void READ_FLOAT(std::ifstream &input, Float_t &value, Bool_t isBE)
{
   READ_UINT(input, (UInt_t&)value, isBE);
}//READ_FLOAT

The problem is that I cannot find any error in these functions. Furthermore, I have adapted these
functions from another program developed especially to read these files.

The second problem is even worse: These binary files (“Calvin”) contain a mixture of ASCII and Unicode characters.
Running the macro on WinXP results in following error:

root [0] .L macroImportData.C
root [1] Init()
root [2] ImportCalvinTest3("tmp_DataCalvinTest3")
Warning: No directory given to store root file:
         Using working directory <C:\home\Rabbitus\rootcode\xps4xda>
Creating new temporary file <C:\home\Rabbitus\rootcode\xps4xda/tmp_DataCalvinTest3_cel.roo
t>...
Opening file <SchemeTest3.root> in <READ> mode...
Importing <C:\home\Rabbitus\rootcode\xps4xda/Test3-1-calvin.CEL> as <TestA1.cel>...
WinXP: Problem in------XGeneChipHyb::ReadGenericDataHeader------
data type identifier = <affymetrix-calvin-intensity>
unique file identifier = <0000065535-1131390299-0000006334-0000018467-0000000041>
datetime = <0AF46630>
locale = <00000000>
numtriplets = 17
i= 0  len= 37  aname = <affymetrix-algorithm-param-Percentile>
i= 0  len= 2  avalue = <75>
i= 0  len= 10  atype = <text/ascii>
i= 1  len= 37  aname = <affymetrix-algorithm-param-CellMargin>
i= 1  len= 1  avalue = <2>
i= 1  len= 10  atype = <text/ascii>
i= 2  len= 38  aname = <affymetrix-algorithm-param-OutlierHigh>
i= 2  len= 5  avalue = <1.500>
i= 2  len= 10  atype = <text/ascii>
i= 3  len= 37  aname = <affymetrix-algorithm-param-OutlierLow>
i= 3  len= 5  avalue = <1.004>
i= 3  len= 10  atype = <text/ascii>
i= 4  len= 34  aname = <affymetrix-algorithm-param-GridULX>
i= 4  len= 16  avalue = <C?∂
x?∂
>
Error: C++ exception caught C:\home\Rabbitus\rootcode\xps4xda\macroImportData.C(122)
(Int_t)(0)
*** Interpreter error recovered ***

The correct output on Linux and Mac is:

root [0] .L macroImportData.C
root [1] Init()                                                                   
root [2] ImportCalvinTest3("DataCalvinTest3")                                     
Warning: No directory given to store root file:
         Using working directory </Volumes/CoreData/ROOT/rootcode/xps-x.x.x/xps4xda>
Creating new file </Volumes/CoreData/ROOT/rootcode/xps-x.x.x/xps4xda/DataCalvinTest3_cel.root>...
Opening file <SchemeTest3.root> in <READ> mode...
Importing </Volumes/CoreData/ROOT/rootcode/xps-x.x.x/xps4xda/Test3-1-calvin.CEL> as <TestA1.cel>...
WinXP: Problem in------XGeneChipHyb::ReadGenericDataHeader------
data type identifier = <affymetrix-calvin-intensity>
unique file identifier = <0000065535-1131390299-0000006334-0000018467-0000000041>
datetime = <0x4e05ec0>
locale = <0>
numtriplets = 17
i= 0  len= 37  aname = <affymetrix-algorithm-param-Percentile>
i= 0  len= 2  avalue = <75>
i= 0  len= 10  atype = <text/ascii>
i= 1  len= 37  aname = <affymetrix-algorithm-param-CellMargin>
i= 1  len= 1  avalue = <2>
i= 1  len= 10  atype = <text/ascii>
i= 2  len= 38  aname = <affymetrix-algorithm-param-OutlierHigh>
i= 2  len= 5  avalue = <1.500>
i= 2  len= 10  atype = <text/ascii>
i= 3  len= 37  aname = <affymetrix-algorithm-param-OutlierLow>
i= 3  len= 5  avalue = <1.004>
i= 3  len= 10  atype = <text/ascii>
i= 4  len= 34  aname = <affymetrix-algorithm-param-GridULX>
i= 4  len= 16  avalue = <C>
i= 4  len= 19  atype = <text/x-calvin-float>
i= 5  len= 34  aname = <affymetrix-algorithm-param-GridULY>
i= 5  len= 16  avalue = <C$>
i= 5  len= 19  atype = <text/x-calvin-float>
i= 6  len= 34  aname = <affymetrix-algorithm-param-GridURX>
i= 6  len= 16  avalue = <Dx¿>
i= 6  len= 19  atype = <text/x-calvin-float>
i= 7  len= 34  aname = <affymetrix-algorithm-param-GridURY>
i= 7  len= 16  avalue = <C >
i= 7  len= 19  atype = <text/x-calvin-float>
i= 8  len= 34  aname = <affymetrix-algorithm-param-GridLRX>
i= 8  len= 16  avalue = <Dy¿>
i= 8  len= 19  atype = <text/x-calvin-float>
i= 9  len= 34  aname = <affymetrix-algorithm-param-GridLRY>
i= 9  len= 16  avalue = <Dz¿>
i= 9  len= 19  atype = <text/x-calvin-float>
i= 10  len= 34  aname = <affymetrix-algorithm-param-GridLLX>
i= 10  len= 16  avalue = <C>
i= 10  len= 19  atype = <text/x-calvin-float>
i= 11  len= 34  aname = <affymetrix-algorithm-param-GridLLY>
i= 11  len= 16  avalue = <D{>
i= 11  len= 19  atype = <text/x-calvin-float>
i= 12  len= 25  aname = <affymetrix-algorithm-name>
i= 12  len= 20  avalue = <>
i= 12  len= 10  atype = <text/plain>
i= 13  len= 21  aname = <affymetrix-array-type>
i= 13  len= 200  avalue = <>
scheme = <Test3>
i= 13  len= 10  atype = <text/plain>
i= 14  len= 19  aname = <affymetrix-cel-cols>
i= 14  len= 16  avalue = <>
fNCols = 126
i= 14  len= 24  atype = <text/x-calvin-integer-32>
i= 15  len= 19  aname = <affymetrix-cel-rows>
i= 15  len= 16  avalue = <>
fNRows = 126
i= 15  len= 24  atype = <text/x-calvin-integer-32>
i= 16  len= 23  aname = <affymetrix-file-version>
i= 16  len= 16  avalue = <>
i= 16  len= 32  atype = <text/x-calvin-unsigned-integer-8>
numparents = 1

For some reason, on WinXP the code crashes always at the same place.
The relevant code in XPSData.cxx is:

typedef struct
{
   Int_t len;
   char *value;
} ASTRING;

typedef struct
{
   Int_t    len;
   wchar_t *value;
} AWSTRING;

Int_t XGeneChipHyb::ReadGenericDataHeader(ifstream &input, Bool_t isParent)
{
   AWSTRING *aname  = 0;
   ASTRING  *avalue = 0;
   AWSTRING *atype  = 0;
   for (Int_t i=0; i<numtriplets; i++) {
      aname  = new AWSTRING;
      avalue = new ASTRING;
      atype  = new AWSTRING;

      READ_WSTRING(input, aname, kTRUE);
      READ_STRING(input, avalue, kTRUE);
      READ_WSTRING(input, atype, kTRUE);

      delete atype;
      delete avalue;
      delete aname;
   }//for_i
}

As before, the functions reading the binary data are defined in XPSUtils.cxx.

As I said, I have tried since some weeks to find the reason for these problems on WinXP, but
I could not find anything. Thus any help would be appreciated, thank you.

P.S.: Could this maybe a memory problem or a memory leak?
I am testing this on a dual boot laptop with 512MB RAM only. When booting the laptop with FC4
everything works, however booting it with WinXP causes above problem.

Best regards
Christian
DataXDATestAB_cel.txt (300 KB)
xps4xda.tar.gz (869 KB)

Hi Christian,

On Windows, you have to specify ‘ios::binary’ when you want to open a file in binary mode (default is text mode).
i.e. in xpsbase.cxx, changing:

by:

solves the problem.
See:

root [0] .L macroImportData.C
root [1] Init()
root [2] ImportCalvinTest3("tmp_DataCalvinTest3")
Warning: No directory given to store root file:
         Using working directory <C:\home\bellenot\rootdev\cstrato\xps4xda>
Creating new temporary file <C:\home\bellenot\rootdev\cstrato\xps4xda/tmp_DataCalvinTest3_cel.root>...
Opening file <SchemeTest3.root> in <READ> mode...
Importing <C:\home\bellenot\rootdev\cstrato\xps4xda/Test3-1-calvin.CEL> as <TestA1.cel>...
WinXP: Problem in------XGeneChipHyb::ReadGenericDataHeader------
data type identifier = <affymetrix-calvin-intensity>
unique file identifier = <0000065535-1131390299-0000006334-0000018467-0000000041>
datetime = <0B505420>
locale = <00000000>
numtriplets = 17
i= 0  len= 37  aname = <affymetrix-algorithm-param-Percentile>
i= 0  len= 2  avalue = <75>
i= 0  len= 10  atype = <text/ascii>
i= 1  len= 37  aname = <affymetrix-algorithm-param-CellMargin>
i= 1  len= 1  avalue = <2>
i= 1  len= 10  atype = <text/ascii>
i= 2  len= 38  aname = <affymetrix-algorithm-param-OutlierHigh>
i= 2  len= 5  avalue = <1.500>
i= 2  len= 10  atype = <text/ascii>
i= 3  len= 37  aname = <affymetrix-algorithm-param-OutlierLow>
i= 3  len= 5  avalue = <1.004>
i= 3  len= 10  atype = <text/ascii>
i= 4  len= 34  aname = <affymetrix-algorithm-param-GridULX>
i= 4  len= 16  avalue = <C?>
i= 4  len= 19  atype = <text/x-calvin-float>
i= 5  len= 34  aname = <affymetrix-algorithm-param-GridULY>
i= 5  len= 16  avalue = <C$>
i= 5  len= 19  atype = <text/x-calvin-float>
i= 6  len= 34  aname = <affymetrix-algorithm-param-GridURX>
i= 6  len= 16  avalue = <Dx+>
i= 6  len= 19  atype = <text/x-calvin-float>
i= 7  len= 34  aname = <affymetrix-algorithm-param-GridURY>
i= 7  len= 16  avalue = <C >
i= 7  len= 19  atype = <text/x-calvin-float>
i= 8  len= 34  aname = <affymetrix-algorithm-param-GridLRX>
i= 8  len= 16  avalue = <Dy+>
i= 8  len= 19  atype = <text/x-calvin-float>
i= 9  len= 34  aname = <affymetrix-algorithm-param-GridLRY>
i= 9  len= 16  avalue = <Dz+>
i= 9  len= 19  atype = <text/x-calvin-float>
i= 10  len= 34  aname = <affymetrix-algorithm-param-GridLLX>
i= 10  len= 16  avalue = <C?>
i= 10  len= 19  atype = <text/x-calvin-float>
i= 11  len= 34  aname = <affymetrix-algorithm-param-GridLLY>
i= 11  len= 16  avalue = <D{Ç>
i= 11  len= 19  atype = <text/x-calvin-float>
i= 12  len= 25  aname = <affymetrix-algorithm-name>
i= 12  len= 20  avalue = <>
i= 12  len= 10  atype = <text/plain>
i= 13  len= 21  aname = <affymetrix-array-type>
i= 13  len= 200  avalue = <>
scheme = <Test3>
i= 13  len= 10  atype = <text/plain>
i= 14  len= 19  aname = <affymetrix-cel-cols>
i= 14  len= 16  avalue = <>
fNCols = 126
i= 14  len= 24  atype = <text/x-calvin-integer-32>
i= 15  len= 19  aname = <affymetrix-cel-rows>
i= 15  len= 16  avalue = <>
fNRows = 126
i= 15  len= 24  atype = <text/x-calvin-integer-32>
i= 16  len= 23  aname = <affymetrix-file-version>
i= 16  len= 16  avalue = <>
i= 16  len= 32  atype = <text/x-calvin-unsigned-integer-8>
numparents = 1
WinXP: Problem in------XGeneChipHyb::ReadGenericDataHeader------
data type identifier = <affymetrix-calvin-scan-acquisition>
unique file identifier = <>
datetime = <0B5055B0>
locale = <00000000>
numtriplets = 6
i= 0  len= 21  aname = <affymetrix-dat-header>
i= 0  len= 266  avalue = <>
chipname = Test3
i= 0  len= 10  atype = <text/plain>
i= 1  len= 21  aname = <affymetrix-array-type>
i= 1  len= 10  avalue = <>
scheme = <Test3>
i= 1  len= 10  atype = <text/plain>
i= 2  len= 21  aname = <affymetrix-pixel-rows>
i= 2  len= 16  avalue = <>
i= 2  len= 24  atype = <text/x-calvin-integer-32>
i= 3  len= 21  aname = <affymetrix-pixel-cols>
i= 3  len= 16  avalue = <>
i= 3  len= 24  atype = <text/x-calvin-integer-32>
i= 4  len= 20  aname = <affymetrix-scan-date>
i= 4  len= 38  avalue = <>
i= 4  len= 10  atype = <text/plain>
i= 5  len= 28  aname = <affymetrix-image-orientation>
i= 5  len= 16  avalue = <>
i= 5  len= 32  atype = <text/x-calvin-unsigned-integer-8>
numparents = 0
WinXP: Problem in------XGeneChipHyb::ReadDataGroup------
   hybridization statistics:
      1 cells with minimal intensity 73.5
      1 cells with maximal intensity 24221
New dataset <DataSet> is added to Content...
Importing <C:\home\bellenot\rootdev\cstrato\xps4xda/Test3-2-calvin.CEL> as <TestA2.cel>...
WinXP: Problem in------XGeneChipHyb::ReadGenericDataHeader------
data type identifier = <affymetrix-calvin-intensity>
unique file identifier = <0000065535-1131390299-0000006334-0000018467-0000000041>
datetime = <0B2921A8>
locale = <00000000>
numtriplets = 17
i= 0  len= 37  aname = <affymetrix-algorithm-param-Percentile>
i= 0  len= 2  avalue = <75>
i= 0  len= 10  atype = <text/ascii>
i= 1  len= 37  aname = <affymetrix-algorithm-param-CellMargin>
i= 1  len= 1  avalue = <2>
i= 1  len= 10  atype = <text/ascii>
i= 2  len= 38  aname = <affymetrix-algorithm-param-OutlierHigh>
i= 2  len= 5  avalue = <1.500>
i= 2  len= 10  atype = <text/ascii>
i= 3  len= 37  aname = <affymetrix-algorithm-param-OutlierLow>
i= 3  len= 5  avalue = <1.004>
i= 3  len= 10  atype = <text/ascii>
i= 4  len= 34  aname = <affymetrix-algorithm-param-GridULX>
i= 4  len= 16  avalue = <C?>
i= 4  len= 19  atype = <text/x-calvin-float>
i= 5  len= 34  aname = <affymetrix-algorithm-param-GridULY>
i= 5  len= 16  avalue = <C$>
i= 5  len= 19  atype = <text/x-calvin-float>
i= 6  len= 34  aname = <affymetrix-algorithm-param-GridURX>
i= 6  len= 16  avalue = <Dx+>
i= 6  len= 19  atype = <text/x-calvin-float>
i= 7  len= 34  aname = <affymetrix-algorithm-param-GridURY>
i= 7  len= 16  avalue = <C%>
i= 7  len= 19  atype = <text/x-calvin-float>
i= 8  len= 34  aname = <affymetrix-algorithm-param-GridLRX>
i= 8  len= 16  avalue = <DxÇ>
i= 8  len= 19  atype = <text/x-calvin-float>
i= 9  len= 34  aname = <affymetrix-algorithm-param-GridLRY>
i= 9  len= 16  avalue = <D|>
i= 9  len= 19  atype = <text/x-calvin-float>
i= 10  len= 34  aname = <affymetrix-algorithm-param-GridLLX>
i= 10  len= 16  avalue = <C?>
i= 10  len= 19  atype = <text/x-calvin-float>
i= 11  len= 34  aname = <affymetrix-algorithm-param-GridLLY>
i= 11  len= 16  avalue = <D{Ç>
i= 11  len= 19  atype = <text/x-calvin-float>
i= 12  len= 25  aname = <affymetrix-algorithm-name>
i= 12  len= 20  avalue = <>
i= 12  len= 10  atype = <text/plain>
i= 13  len= 21  aname = <affymetrix-array-type>
i= 13  len= 200  avalue = <>
scheme = <Test3>
i= 13  len= 10  atype = <text/plain>
i= 14  len= 19  aname = <affymetrix-cel-cols>
i= 14  len= 16  avalue = <>
fNCols = 126
i= 14  len= 24  atype = <text/x-calvin-integer-32>
i= 15  len= 19  aname = <affymetrix-cel-rows>
i= 15  len= 16  avalue = <>
fNRows = 126
i= 15  len= 24  atype = <text/x-calvin-integer-32>
i= 16  len= 23  aname = <affymetrix-file-version>
i= 16  len= 16  avalue = <>
i= 16  len= 32  atype = <text/x-calvin-unsigned-integer-8>
numparents = 1
WinXP: Problem in------XGeneChipHyb::ReadGenericDataHeader------
data type identifier = <affymetrix-calvin-scan-acquisition>
unique file identifier = <>
datetime = <0B292378>
locale = <00000000>
numtriplets = 6
i= 0  len= 21  aname = <affymetrix-dat-header>
i= 0  len= 264  avalue = <>
chipname = Test3
i= 0  len= 10  atype = <text/plain>
i= 1  len= 21  aname = <affymetrix-array-type>
i= 1  len= 10  avalue = <>
scheme = <Test3>
i= 1  len= 10  atype = <text/plain>
i= 2  len= 21  aname = <affymetrix-pixel-rows>
i= 2  len= 16  avalue = <>
i= 2  len= 24  atype = <text/x-calvin-integer-32>
i= 3  len= 21  aname = <affymetrix-pixel-cols>
i= 3  len= 16  avalue = <>
i= 3  len= 24  atype = <text/x-calvin-integer-32>
i= 4  len= 20  aname = <affymetrix-scan-date>
i= 4  len= 38  avalue = <>
i= 4  len= 10  atype = <text/plain>
i= 5  len= 28  aname = <affymetrix-image-orientation>
i= 5  len= 16  avalue = <>
i= 5  len= 32  atype = <text/x-calvin-unsigned-integer-8>
numparents = 0
WinXP: Problem in------XGeneChipHyb::ReadDataGroup------
   hybridization statistics:
      1 cells with minimal intensity 57.3
      1 cells with maximal intensity 30728.8
root [3]

Now, up to you to see how/where exactly to change the code :wink:

Cheers,
Bertrand.

Dear Bertrand

Thank you very much for this information, I have the feeling you know everything :slight_smile:

I have one more question:
It seems that your solution solves all my problems, since it seems not to affect “text” files.
I can open text-only files even if I define “ios::binary” when opening the text file.

Do you know if I can safely use “ios::binary” when opening any file?
This would be the simplest solution!
(Somewhere I have read that a text file is only a special case of a binary file, is this correct?)

Otherwise, if this is not possible, would the following be ok:
First, I open the file as: input(name, ios::in | ios::binary)
Then, I close it again if it is not a binary file, and reopen it as: input.open(name, ios::in)

Best regards
Christian

Hi Christian,

Thanks for the nice comment :blush:
It should be OK to always open files with the ios::binary flag, depending the way you read the file. Fore more infos (i.e):
http://answers.yahoo.com/question/index?qid=20080329084115AAEHAs9&show=7

Cheers,
Bertrand.

Dear Bertrand

Thank you, this is really good news.

Interestingly, the link you mention was exactly the one I was referring to:-)

I am a little confused by another link from MS:
msdn2.microsoft.com/en-us/library/4yy87z4f.aspx
where the example produces the wrong output. However, this is an output file.

Best regards
Christian

Dear Christian,

Yes, you have to be careful when writing in text/binary files, still depending on the way you do it…

Best, Bertrand.

Dear Bertrand

Yes, I know, I will test it carefully.

Best regards
Christian