Could you please extend the TGraph constructor which reads a filename, (or release a new one)
in order it can transparently read csv-like files, that is,
in order to avoid the need to explicitly discard non space-like separators:"%lg %*s %lg %*s %lg"…
Please note that figures are sometimes just separated by a character (’;’) without any spaces around it.
A mechanism similar to that of TString::Tokenize(TString delimiter) where several delimiters (" \t;,")
can be specified in one go regardless their frequency would be quite appropriate I reckon.
Hi there,
Here’s a first shot.
Finally I used TPRegexp: only “%lg” and “%*lg” are allowed in “format”.
The extension to classes inheriting from TGraph (TGraphErrors, TGraphAssymErrors, TGraphBentErrors…) are to be done.
I left my debug statements in the code (see ‘//’ (’///’ are non-code comments)).
The test files I used are given commented at the end of the code.
I made a quick speedtest: this new (non-optimized) constructor is ~15 times slower than the former (see code).
Cheers,
Z tgraph_ctor.cpp (8.06 KB)
15 slower is huge.
Therefore this new ctor cannot replace the current one.
If we take it, it has to be a new one.
Un fortunately you new ctor has the same signature as the current one.
Finally we cannot take the code as it is now. It is a prototype…
The easiest and fastest way is to modify the input format (see function ctor_current_exactdelim).
Unfortunately to cope with cases where figures are just separated by one character without any spaces around it (e.g. ‘2;5’), sscanf implies to have exactly the same delimiter string between each data.
To overcome this limitation (without modifying the input file!), one can use strtok.
This is done in function ctor_strtok which results in an execution time (only) two times slower than the current TGraph constructor.
Similar remarks as above holds for the new code:
Only “%lg”, “%*lg” or “*s” are allowed in “format”.
‘///’ comments are descriptive comments. ‘//’ comments are trial statements.
The test files I used are given commented at the end of the code.
The extension to classes inheriting from TGraph (TGraphErrors, TGraphAssymErrors, TGraphBentErrors…) is still to be done.
Note a boolean named ‘debug’ has been added for verbosity in the main function.
Are you building ROOT from the sources ? if yes, it would be easier for us if you modify the TGraph class to make a real ctor. Doing the real packaging you may face some problems you do not see with an example macro like this one.
“ctor_current_exactdelim” is quite limited but it is as fast as the current constructor.
“ctor_regexp” and “ctor_strtok” are equivalent in terms of functionality.
“ctor_regexp” is 10-15 times slower than the current constructor, “ctor_strtok” only 2 times.
So I would go for “ctor_strtok”.
I’ll try to implement it directly in the TGraph class in the coming days.
I’ll let you know.
In the meanwhile, thanks for sharing any comments on the code.
To run some tests I’ve created a “triple constuctor” in TGraph.cxx, namelyTGraph (const char *filename, const char *format, const char *delim, Int_t ctorChoice, Bool_t debug=kFALSE) ;where ctorChoice allows one to choose 1 constuctor among 3:
ctorChoice=0 for the “regexp” ctor
ctorChoice=1 for the “strtok” ctor
ctorChoice=2 for a new ctor referred to as “tokenize” which uses TString::Tokenize() instead of strtok.
Once integrated (see ztest.cpp), I get the following mean speed factors (on CentOS_5.6 with ROOT_5.31.01 using ACLiC): regexp/current ~ 4.0, strtok/current ~ 2.0, tokenize/current ~ 4.3.
(note these ratio change when the same code is implemented in plain functions (see tgraph_ctor.cpp)).
Again, I would go for the strtok constuctor.
What do you reckon?
As I said as soon as you make a new ctor and you keep the old one for compatibility you can choose
the algoritm you think is the best. So make the final version and we will include it.
[quote]as soon as you make a new ctor and you keep the old one for compatibility…[/quote]Actually I would “fork” the current constructor taking advantage of the unused “Option_t*” argument the type of which is “const char*” like “delim”… (See TGraph.cxx where all debug statements have been removed).
What do you reckon?
[quote]Also, I do not think you need: #include “TPRegexp.h” In the .h file. Better put it in the .cxx[/quote]Just to know… Is it for clarity reason (since it’s not explicitly used in TGraph.h)? Or is it a blunder (compilation time, etc.)?
[quote]Just to know… Is it for clarity reason (since it’s not explicitly used in TGraph.h)? Or is it a blunder (compilation time, etc.)?[/quote]For all those reasons , it is both clearer and faster and reduce the coupling between files (i.e. without it, a source that does include TGraph will not need to be recompiled if TPRegexp.h changes).
[quote]Also, I do not think you need: #include “TPRegexp.h” In the .h file. Better put it in the .cxx[/quote]Just to know… Is it for clarity reason (since it’s not explicitly used in TGraph.h)? Or is it a blunder (compilation time, etc.)?
[/quote]
I can answer this question Yes, this is mainly about the compilation time - in C++ programmers usually try to reduce dependencies: if you do not need some declarations, you do not include them. Modern compilers have different tricks to help you (for example, clang compiles ROOT in 5 minutes on my mac), but still, you’d better avoid redundant inclusions - this is additional work for compiler in general to parse again and again something it does not really need.
[quote]I cannot put the code you sent in the trunk[/quote]This release was absolutly not meant to be put in the trunk (sorry for the misunderstanding).
[quote]I see you have changed the standard ctor[/quote]Yes, the former “triple constuctor” was just a dummy ctor I’ve posted to get some feedback.
The ctor I’d like to add just requires 3 arguments: filename, format, delimiter.
So it conflicts with the standard/current one.
One way to cope with that is to use the Option_t argument of the standard/current ctor
since it has the right type (const char*) and is absolutly not used.
This is what I do in the standard/current ctor: if (strcomp(option,"")==0) {
/* use standard/current ctor */
} else {
/* use new delimiter specified ctor */
}Note: No changes to TGraph.h.
So, please find attached a first release candidate
properly formatted (using Root’s .astylerc, http://root.cern.ch/drupal/content/c-coding-conventions),
and successfully tested with today’s (30 sep 2011) 13h30’s trunk (5.31/01).
Much better
The Options parameter was not use before … may be we can call it delim ?
no option now means “old way” … an option means “a delimiter” …
To be think …
Renaming “option” to “delimiter” would be clearer for sure.
Note that for consistency the same modifications (renaming and ctor enhancement) should be applied to TGraphErrors and TGraph2D.
Cheers,
Z