Hi All,
I am writing some code to read an arbitrarily delimited data file into ROOT, and ran into some issues. Maybe someone here can easily identify what the problem is ?
First, the goals of this code:
(1) It will read an arbitrarily delimited file into ROOT (including missing values in the file, e.g., two consecutive delimiters with nothing or whitespace between them).
(2) By reading the first few lines of the file, it will guess the type (numeric or character) of each column of data.
(3) Based on the information above it will book an appropriate tree and fill it(missing numeric values will be coded to some user-defined specific value, e.g., -99)
By the way, such interfaces to read any arbitrary data exist in most statistical packages. TTree::ReadFile() can only read very strictly formatted files; a lot of raw data does not come in such a nice format. So I think inability of ROOT to easily read an arbitrarily formatted data can deter some people from using ROOT and the power it provides. This is a humble attempt to fill this gap.
Now, let me describe the problem.
The attached file “ReadData.C” is the code I wrote. It reads the attached file “SampleData.txt”, and the resulting ROOT file is the attached “SampleData.root”.
I see two problems at the moment that I can not figure out:
(1) While Var2 is correctly identified as Character, in the filled tree, the values don’t show up. I am not sure why. The numeric values seem to appear in the right format in the resulting tree.
(2) If you look carefully at line 2 (first line of data, after the header) of SampleData.txt, you will see that the value of Var3 is missing. There are two commas next to each other, with nothing in between. I thought the way I wrote the code, this should result in a value = -99 (for missing value). However, in the resulting tree, no such value is seen for Var3. So I think there is something wrong with the regular expression I am using to parse the data, or something else.
Any ideas ? If this works out, we can clean this up, add some more functionality to it, and hopefully incorporate into ROOT someday.
Many thanks in advance.
-Arun
SampleData.root (5.65 KB)
SampleData.txt (490 Bytes)
ReadData.C (5.93 KB)