Suggestion: TGraph to read various columns of a text file

[quote]Timur, PLEASE PLEASE!
skipped
Rene[/quote]

:slight_smile: :slight_smile: :slight_smile:

Agree, I should stop. But just to make things clear:

Rene, sure, I understand your point (and topic author’s request).
You (and author) want concrete fixed ctor, which has several parameters: file name (this will be the text file with several columns of data), printf-like format, and, possibly, some options to show, what is the number of x and y column. That’s all. The only thing you need (IMHO) is to tokenize and parse options like “%d” or “%/f” or “%s” to understand, what is the type of object to pass to sscanf. And there is x and y column.

So, if this is all, that’s good.

My point was, that data pre-processing should be done externally, by data pre-processing I mean(t) not only reading with some known format, but a lot of additional possible actions: filtering (for example, I want to skip some data), may be sorting (points can be in any order) etc. etc. So, this is simply different point of view:)

Yes, that’s all. No new functions, no new arguments, just an additional option.

[quote]My point was, that data pre-processing should be done externally, by data pre-processing I mean(t) not only reading with some known format, but a lot of additional possible actions: filtering (for example, I want to skip some data), may be sorting (points can be in any order) etc. etc. So, this is simply different point of view:)[/quote]As explained in my original post, it is better to do that with a TNtuple object that can also be made with one statement from a text file.
Once the ntuple is created, the user can make his/her own selections and display any combination of the columns.
But for many operations the simple case of a TGraph is vastly sufficient.

I am happy to see that others than me think such code is not trivial. It for sure did take me long time to find out how to plot my data without the feature you are discussing here.

Yes, correct, it is a non trivial logic to implement it correctly and in a general way. Still waiting the code from Eddy and Timur ::slight_smile:

Rene

Has this been eventually implemented?

N.

As a stopgap, here’s a class that I use to read ascii files. It returns any row or column as a Double_t* or std::vector<Double_t*>, and any value in the file X(i,j) as a Double_t. It also disregards any lines containing what it thinks are non numerical values, like file headers for example. I’m sure it’s woefully inefficient and I know that Rene already said this is not the best solution, but it does do the job!

Cheers,

Hugh
LinkDef.h (156 Bytes)
TTextToVector.h (2.11 KB)
TTextToVector.cxx (6.55 KB)

Hugh,

Your approach is interesting and a step in the right direction. I have some observations:
-In your approach you store the information 3 times in memory (TBbjstring and in vector format: rows and columns). This could be a problem in case of very large data sets, but I agree most of the time it will be used for small data sets.
-may be one way to solve the space problem could be to specify the list of columns of interest in the constructor, or the ReadfFile function. do not store the strings, but only the columns that are relevant.
-implementing point 2 generates a numbering problem for the columns (eg you store original columns 2, 5 and 7: how do you access column 2 now stored in the first column?
-one could have a special case for TGraphs (yet another copy). Instead of storing into your class, the results could directly to the TGraph store.

Rene

Hi Rene,

Thanks for the tips. The way to resolve the numbering issue would be to use a std::map<Int_t, std::vector<Double_t> > to store the columns instead of a vector. In this way the numbering is arbitrary. I was actually planning to implement a template class to produce graphs directly from ascii files. Something like TTextToGraph. This was going to use the TTextToVector class to read the text files and would essentially just be a utility to remove the intermediate step of passing the generated vectors to the relevant TGraph constructor. Sadly, other more pressing matters stopped me from getting it done, but I’ll let you know if I ever finish it.

Cheers,

Hugh

Several years on and I’ve finally go back to this problem! I’ve attached my solution and while it’s definitely not perfect (in fact I’ve only just finished it so it probably has lots of bugs, and there are still some efficiency issues) it might be useful for anyone who just wants a quick way of plotting arbitrary columns from a text file.

Once it’s built and the library loaded into CINT it should be as simple as e.g.

TTextToGraph<TGraphAsymmErrors> gr("textfile", "0,2,4-7"); gr.Draw("APEZ")

which will plot column 0 versus column 2 with the errors taken from columns 4 to 7. The field format is whitespace insensitive and should contain integers for the fields and commas or hyphens for separators. You can also select specific rows using a third argument, but the default is for all rows.

If that’s too complicated there are also some typedefs defined so you could equivalently write

TTTGraphAsymmErrors gr("textfile", "0,2,4-7"); gr.Draw("APEZ")

Look at the code for more details.

Cheers,

Hugh
code.tar.gz (4.08 KB)