Warnings using TTree::ReadFile

jfcaron · January 28, 2014, 10:59pm

Hi, I am using TTree::ReadFile as follows:

    TTree t("DOCA_Correction","DOCA_Correction");
    t.ReadFile("DOCA_Correction_00110_1390948804.csv");

The .csv file in question is attached. It has the “magic line” defining the various branches that are to be filled. It also has two #commented lines for metadata.

It seems to read the file just fine, but when executing the ReadFile, I get a warning for each entry in the file:

" while reading data for branch mean_error on line 9

I don’t have any double- or single-quote characters in the file, so I am wondering what is causing this error. Is my .csv file mal-formed somehow?

Jean-François
DOCA_Correction_00110_1390948804.csv (2.18 KB)

tpochep · January 28, 2014, 11:12pm

// Error handling 6690 if (sToken.bad()) { 6691 // How could that happen for a stringstream? 6692 Warning("ReadStream", 6693 "Buffer error while reading data for branch %s on line %lld", 6694 branch->GetName(), nlines); 6695 } else if (!sToken.eof()) { 6696 if (sToken.fail()) { 6697 Warning("ReadStream", 6698 "Couldn't read formatted data in \"%s\" for branch %s on line %lld; ignoring line", 6699 tok.Data(), branch->GetName(), nlines); 6700 goodLine = kFALSE; 6701 } else { 6702 std::string remainder; 6703 std::getline(sToken, remainder, newline); 6704 if (!remainder.empty()) { 6705 Warning("ReadStream", 6706 "Ignoring trailing \"%s\" while reading data for branch %s on line %lld", 6707 remainder.c_str(), branch->GetName(), nlines); 6708 } 6709 } 6710 } 6711 } // tokenizer loop

Do you have "Ignoring trailing " or “Buffer error while” message? Or do you have exactly the chunk of message you demonstrated? In this case, it’s most probably “Ignoring trailing” part with something like ‘\r’ or similar in ‘remainder’. May be it can help if you save you cvs with ‘unix style’ new lines?

jfcaron · January 28, 2014, 11:32pm

I just have exactly that chunk of the warning message, but 25 times (for each line from 2 to 26). No other warnings.

The .csv file was written using python’s csv module, which I just learned might give bad EOL characters on Windows machines (though I am on a modern Mac). They suggest opening the file in binary mode instead of text mode: stackoverflow.com/questions/1170 … terminator

I have tried this and the warnings still appear.

Oddly enough, if I try using the code in PyROOT using ROOT.gROOT.ProcessLine, the warning message lines look like this instead:

" while reading data for branch mean_error on line 2ng " " while reading data for branch mean_error on line 3ng " " while reading data for branch mean_error on line 4ng " " while reading data for branch mean_error on line 5ng " ...

I don’t know where the “ng” and the extra closing quotation are coming from in PyROOT.

Jean-François

tpochep · January 28, 2014, 11:59pm

I think ReadFile does not work correctly with ‘\r\n’ (*), even if you fix ‘\r\r\n’ in your python script and even if you fix the first three lines (which in your case have different line endings).

One problem - do not mix line endings, since ReadFile tries to guess what line endings you are using and
The second - you’d better just process your csv to get rid of “\r\n” or ask python csv module to write ‘\n’.

(*):

char *cursor = bd; 6515 while( isspace(*cursor) && *cursor != '\n' && *cursor != '\0') { 6516 ++cursor; 6517 } 6518 if (*cursor != '#' && *cursor != '\n' && *cursor != '\0') { 6519 break; 6520 }

This code is wrong. Must be something like

6515 while( isspace(*cursor) && *cursor != '\0') { 6516 ++cursor; 6517 } 6518 if (*cursor != '#' && *cursor != '\0') { 6519 break;

tpochep · January 29, 2014, 4:39pm

Ok, at the moment I’m working on another issue you’ve reported in JIRA (related to TNtuple::ReadFile) and later I’ll revise TTree::ReadFile - we have some other reports/feature requests + I’ll fix the newline problems - so you’ll be able to use whatever you want without fixing your csv and with any newline characters you want (’\r’, ‘\n’, “\r\n” freely intermixed).

jfcaron · January 29, 2014, 5:46pm

Thanks for working on it. I didn’t realize that my csv file was written with mixed EOL characters. I’ll look at how python is doing the writing to make it consistent.

Jean-François

tpochep · January 29, 2014, 7:24pm

[quote=“jfcaron”]Thanks for working on it. I didn’t realize that my csv file was written with mixed EOL characters. I’ll look at how python is doing the writing to make it consistent.

Jean-François[/quote]

In principle, we should process any of possible newline-characters - I fixed this in TNtuple (fix for TTree will be more complex though).

For example, if I read your file as TNtuple (commenting the “format” line, of course) I get this:

************************************************************************
*    Row   *         x *         y *         z *         u *         v *
************************************************************************
*        0 *       110 * 0.0087710 * 0.0040000 * -0.051829 * 0.0021490 *
*        1 *       110 * 0.0167709 * 0.0040000 * -0.030320 * 0.0014803 *
*        2 *       110 * 0.0247709 * 0.0040000 * -0.012962 * 0.0010447 *
*        3 *       110 * 0.0327709 * 0.0040000 * 0.0004885 * 0.0007665 *
*        4 *       110 * 0.0407710 * 0.0040000 * 0.0055176 * 0.0006318 *
*        5 *       110 * 0.0487710 * 0.0040000 * 0.0114530 * 0.0008419 *
*        6 *       110 * 0.0567709 * 0.0040000 * 0.0141037 * 0.0007582 *
*        7 *       110 * 0.0667710 * 0.0060000 * 0.0172236 * 0.0007785 *
*        8 *       110 * 0.0767709 * 0.0040000 * 0.0171032 * 0.0009981 *
*        9 *       110 * 0.0867709 * 0.0060000 * 0.0179044 * 0.0007127 *
*       10 *       110 * 0.0987709 * 0.0060000 * 0.0195785 * 0.0008694 *
*       11 *       110 *  0.110771 * 0.0060000 * 0.0187601 * 0.0007722 *
*       12 *       110 * 0.1227710 * 0.0060000 * 0.0183314 * 0.0008579 *
*       13 *       110 * 0.1347710 * 0.0060000 * 0.0169693 * 0.0008073 *
*       14 *       110 * 0.1467709 * 0.0060000 * 0.0184496 * 0.0007453 *
*       15 *       110 * 0.1587709 * 0.0060000 * 0.0183959 * 0.0007735 *
*       16 *       110 * 0.1707710 * 0.0060000 * 0.0174962 * 0.0007600 *
*       17 *       110 * 0.1827709 * 0.0060000 * 0.0194552 * 0.0006834 *
*       18 *       110 * 0.1947710 * 0.0060000 * 0.0198235 * 0.0007442 *
*       19 *       110 * 0.2067710 * 0.0060000 * 0.0208042 * 0.0007281 *
*       20 *       110 * 0.2187709 * 0.0060000 * 0.0227066 * 0.0007154 *
*       21 *       110 * 0.2307710 * 0.0060000 * 0.0267357 * 0.0007382 *
*       22 *       110 *  0.242771 * 0.0060000 * 0.0308989 * 0.0007445 *
*       23 *       110 * 0.2547709 * 0.0060000 * 0.0411104 * 0.0009666 *
*       24 *       110 * 0.2687709 * 0.0080000 * 0.0514362 * 0.0013914 *

and not warnings/errors

jfcaron · January 29, 2014, 7:35pm

I use ROOT from MacPorts, so I’ll have to wait for a release to try out the fix. In the meantime, consistently using ‘\n’ as the line terminator seems to have removed the warnings.

Apparently python’s csv writer by default uses ‘\r\n’ independent of platform, but it has an optional keyword argument “lineterminator” that you can set. Setting this to ‘\n’ makes it consistent with the writing of the other strings (and Linux & MacOSX standard), so the warnings don’t show up anymore.

http://docs.python.org/2/library/csv.html#csv.Dialect.lineterminator

Jean-François