TGraph constructor enhancement

Hi there,

Could you please extend the TGraph constructor which reads a filename, (or release a new one)
in order it can transparently read csv-like files, that is,
in order to avoid the need to explicitly discard non space-like separators:"%lg %*s %lg %*s %lg"…
Please note that figures are sometimes just separated by a character (’;’) without any spaces around it.

A mechanism similar to that of TString::Tokenize(TString delimiter) where several delimiters (" \t;,")
can be specified in one go regardless their frequency would be quite appropriate I reckon.

Cheers,
Z

Can you do a prototype and test it ?

:wink:
I’ll give it a try this week-end #-o
Cheers,
Z

Hi there,
Here’s a first shot.
Finally I used TPRegexp: only “%lg” and “%*lg” are allowed in “format”.
The extension to classes inheriting from TGraph (TGraphErrors, TGraphAssymErrors, TGraphBentErrors…) are to be done.
I left my debug statements in the code (see ‘//’ (’///’ are non-code comments)).
The test files I used are given commented at the end of the code.
I made a quick speedtest: this new (non-optimized) constructor is ~15 times slower than the former (see code).
Cheers,
Z
tgraph_ctor.cpp (8.06 KB)

Any feedback?
Cheers,
Z

Sorry I missed your update after coming back fro holidays.
Thanks for the reminder.
Looking now

15 slower is huge.
Therefore this new ctor cannot replace the current one.
If we take it, it has to be a new one.
Un fortunately you new ctor has the same signature as the current one.
Finally we cannot take the code as it is now. It is a prototype…

Ok, second (and third) try.

The easiest and fastest way is to modify the input format (see function ctor_current_exactdelim).
Unfortunately to cope with cases where figures are just separated by one character without any spaces around it (e.g. ‘2;5’), sscanf implies to have exactly the same delimiter string between each data.

To overcome this limitation (without modifying the input file!), one can use strtok.
This is done in function ctor_strtok which results in an execution time (only) two times slower than the current TGraph constructor.

Similar remarks as above holds for the new code:

  • Only “%lg”, “%*lg” or “*s” are allowed in “format”.
  • ‘///’ comments are descriptive comments. ‘//’ comments are trial statements.
  • The test files I used are given commented at the end of the code.
  • The extension to classes inheriting from TGraph (TGraphErrors, TGraphAssymErrors, TGraphBentErrors…) is still to be done.
    Note a boolean named ‘debug’ has been added for verbosity in the main function.

What do you reckon?

Cheers,
Z
tgraph_ctor.cpp (13.1 KB)

Thanks,
I see you have 3 ctor now in your macro:

ctor_current_exactdelim
ctor_strtok
ctor_strtok

Which one do you want us to put in TGraph ?

Are you building ROOT from the sources ? if yes, it would be easier for us if you modify the TGraph class to make a real ctor. Doing the real packaging you may face some problems you do not see with an example macro like this one.

“ctor_current_exactdelim” is quite limited but it is as fast as the current constructor.
“ctor_regexp” and “ctor_strtok” are equivalent in terms of functionality.
“ctor_regexp” is 10-15 times slower than the current constructor, “ctor_strtok” only 2 times.
So I would go for “ctor_strtok”.

I’ll try to implement it directly in the TGraph class in the coming days.
I’ll let you know.
In the meanwhile, thanks for sharing any comments on the code.

Cheers,
Z

Note that if it is a new constructor and if we keep the current one we can go for the slow complete version.

Hi there,

To run some tests I’ve created a “triple constuctor” in TGraph.cxx, namelyTGraph (const char *filename, const char *format, const char *delim, Int_t ctorChoice, Bool_t debug=kFALSE) ;where ctorChoice allows one to choose 1 constuctor among 3:

  • ctorChoice=0 for the “regexp” ctor
  • ctorChoice=1 for the “strtok” ctor
  • ctorChoice=2 for a new ctor referred to as “tokenize” which uses TString::Tokenize() instead of strtok.
    Once integrated (see ztest.cpp), I get the following mean speed factors (on CentOS_5.6 with ROOT_5.31.01 using ACLiC): regexp/current ~ 4.0, strtok/current ~ 2.0, tokenize/current ~ 4.3.
    (note these ratio change when the same code is implemented in plain functions (see tgraph_ctor.cpp)).

Again, I would go for the strtok constuctor.
What do you reckon?

Cheers,
Z
tgraph_ctor.tgz (28.7 KB)

I tried your example. It seems to work.

As I said as soon as you make a new ctor and you keep the old one for compatibility you can choose
the algoritm you think is the best. So make the final version and we will include it.

Also, I do not think you need:

#include “TPRegexp.h”

In the .h file. Better put it in the .cxx

[quote]as soon as you make a new ctor and you keep the old one for compatibility…[/quote]Actually I would “fork” the current constructor taking advantage of the unused “Option_t*” argument the type of which is “const char*” like “delim”… (See TGraph.cxx where all debug statements have been removed).
What do you reckon?

[quote]Also, I do not think you need: #include “TPRegexp.h” In the .h file. Better put it in the .cxx[/quote]Just to know… Is it for clarity reason (since it’s not explicitly used in TGraph.h)? Or is it a blunder (compilation time, etc.)?

Cheers,
Z
tgraph_ctor2.tgz (25.4 KB)

I cannot put the code you sent in the trunk because:

  • I see you have changed the standard ctor:
TGraph::TGraph(const char *filename, const char *format, Option_t *)
       : TNamed("Graph",filename), TAttLine(), TAttFill(1,1001), TAttMarker()

I thought we agreed that you make a new ctor only ?

  • Your new ctor does not seems finished:
  • There is commented code.
  • There is a “debug” parameter
  • There is a comment “To be adjusted”
  • The formatting is no very readable: very long compact lines without comments, no indentation …
  • You need to be consistent to display error messages Use Error not fprintf.
  • The meaning of the input parameters in not explained (delim, ctorChoice)
    etc …
  • I see you started from an old version of TGraph. Save primitives has change. Better do a diff against the trunk.

Sorry but your code cannot be put as it is in the trunk…

[quote]Just to know… Is it for clarity reason (since it’s not explicitly used in TGraph.h)? Or is it a blunder (compilation time, etc.)?[/quote]For all those reasons :slight_smile:, it is both clearer and faster and reduce the coupling between files (i.e. without it, a source that does include TGraph will not need to be recompiled if TPRegexp.h changes).

Philippe.

[quote]Also, I do not think you need: #include “TPRegexp.h” In the .h file. Better put it in the .cxx[/quote]Just to know… Is it for clarity reason (since it’s not explicitly used in TGraph.h)? Or is it a blunder (compilation time, etc.)?
[/quote]

I can answer this question :slight_smile: Yes, this is mainly about the compilation time - in C++ programmers usually try to reduce dependencies: if you do not need some declarations, you do not include them. Modern compilers have different tricks to help you (for example, clang compiles ROOT in 5 minutes on my mac), but still, you’d better avoid redundant inclusions - this is additional work for compiler in general to parse again and again something it does not really need.

Philippe & Timur

Thanks :smiley:

Olivier

[quote]I cannot put the code you sent in the trunk[/quote]This release was absolutly not meant to be put in the trunk (sorry for the misunderstanding).

[quote]I see you have changed the standard ctor[/quote]Yes, the former “triple constuctor” was just a dummy ctor I’ve posted to get some feedback.
The ctor I’d like to add just requires 3 arguments: filename, format, delimiter.
So it conflicts with the standard/current one.
One way to cope with that is to use the Option_t argument of the standard/current ctor
since it has the right type (const char*) and is absolutly not used.
This is what I do in the standard/current ctor: if (strcomp(option,"")==0) { /* use standard/current ctor */ } else { /* use new delimiter specified ctor */ }Note: No changes to TGraph.h.

So, please find attached a first release candidate
properly formatted (using Root’s .astylerc, http://root.cern.ch/drupal/content/c-coding-conventions),
and successfully tested with today’s (30 sep 2011) 13h30’s trunk (5.31/01).

Cheers,
Z
tgraph_ctor3.tgz (22.8 KB)

Much better :slight_smile:
The Options parameter was not use before … may be we can call it delim ?
no option now means “old way” … an option means “a delimiter” …
To be think …

Renaming “option” to “delimiter” would be clearer for sure.
Note that for consistency the same modifications (renaming and ctor enhancement) should be applied to TGraphErrors and TGraph2D.
Cheers,
Z