Incorrect # of events in tree constructed from ascii file

mradul · April 30, 2013, 9:08am

Hi…
cernbuild.C from /root/tutorial/tree was taken to generate the trees for events from an ascii file. However, the # of events in the constructed tree is more than the events available in ascii data file. Here is a snapshot:

FILE *fbk = fopen(Form("%ssample-noal.tes",dir.Data()),“r”);
tree->Branch(“len”,&len,“len/F”);
tree->Branch(“wid”,&wid,“wid/F”);
tree->Branch(“size”,&size,“size/F”);
…
…
tree->Branch(“Class”,&Class,“Class/I”); // a total of 10 variables
char line[80];

while (fgets(&line,80,fp)) {
sscanf(line,"%f %f %f %f %f %f %f %f %f %d " , &len,&wid,&size,&conc,&conc1,&asym,&m3long,&m3trans,&dist,&Class);
tree->Fill();
}
if (print) tree->Print();
tree->Write();

fclose(fbk);

My ascii file contains 12680 events whereas the tree has 12686 events. I am not sure where is the problem.

Rgds
Mradul

Wile_E_Coyote · April 30, 2013, 10:42am

Search your ascii data file for empty lines and lines which are longer than 80 characters.

mradul · April 30, 2013, 10:55am

Hi…
The datafile has lines with maximum of 78 characters ( checked with awk ‘length() > 78’ file ) and there is no empty space. The file under consideration (sample-noal.dat) is attached.
sample-noal.dat (886 KB)

honk · April 30, 2013, 11:03am

Your data file has two spaces before the last column.

Wile_E_Coyote · April 30, 2013, 11:03am

{ TTree *t = new TTree("t", "sample-noal"); t->ReadFile("sample-noal.dat", "len/F:wid:size:conc:conc1:asym:m3long:m3trans:dist:Class/I"); t->Print(); }

honk · April 30, 2013, 11:10am

std::cerr << "Hi Wilie\n!";

Yes, that should work, but you didn’t explain why.

The format string claimed that all columns where separated by one space, but in the actual input the last column was offset by two spaces. That caused sscanf to not produce the expected output (without diagnostic since if you use it you know you shoot yourself in the foot).

Now one can use TTree::ReadFile which will automatically figure out how many spaces there are between columns, but that doesn’t explain why parsing by hand failed here.

Wile_E_Coyote · April 30, 2013, 11:18am

BTW. You do have 6 lines which need more than 80 characters (just 81). Increase your “line” buffer to 90 and it should be fine.
P.S. I don’t think you need to care about the “double space” before the “last column” (i.e. they should not make you any problems). In a “format string” … a “sequence of white-space characters (space, tab, newline, etc.; see isspace(3)) … matches any amount of white space, including none, in the input”.

mradul · April 30, 2013, 12:49pm

Dear Wile
Increasing the “line” buffer to 90 solved the problem. And you are also right about the “double space”. Increase in the buffer takes care of the problem.

Thanks a lot.
mradul

BTW. by what command you found that the datafile have 6 lines which need more than 80 characters?

honk · April 30, 2013, 1:55pm

@Wilie: You were right, sscanf isn’t picky about extra spaces like I thought. I think it’s still a good idea to some basic explanation instead of just a working sample.

Cheers,

b.

Wile_E_Coyote · May 4, 2013, 11:20am

The description of the “fgets” function explicitly says that, in the “buffer” string, you need space for at least 2 (two) additional characters -> a “trailing” newline character (“LF”) and a “terminating” null character (“NULL”).
So, you need a “buffer” string which is at least 2 (two) characters longer than your maximum line length.
If you were dealing with a text file in the “DOS format”, you would need to make sure that you have space for 3 (three) additional characters -> “CR”, “LF”, “NULL”.

The command:
awk ‘length() >= 79’ sample-noal.dat
returns 6 lines.
Each of them is actually just 79 characters long, so you need a “buffer” string which is at least 81 (= 79 + 2) characters long.
You could try (please note also that the first “fgets” parameter is “line”, not “&line”): // ... char line[81]; // ... while (fgets(line, sizeof(line), fp)) { // ...

pcanal · June 25, 2013, 3:48pm

Hi,

As an aside, there is also a function TTree::ReadFile which may (or may not) be able to process your input file.

Cheers,
Philippe.