Dear Rooters
When you export Excel files as *.csv, the different columns are enclosed in
quotation marks, e.g.:
“12”,“13”,“14”,“15”
Thus it is possible to contain text in different rows separated by commas, e.g.
“My first”,“My second”,“My third,fourth”,“My seventh, eighth”
Now I want to tokenize the columns and tried the following code:
TString csv = "\",\"";
TString str = TString(&nextline[0]);
TObjArray *strobj = str.Tokenize(csv);
Int_t numsep = strobj->GetEntries() - 1;
cout << "numsep = " << numsep << endl;
cout << "At(2) = " <<TObjString>At(2))->GetString() << endl;
delete strobj;
Sorrowly, this does not work, for the samples above the output would be:
-
example:
numsep = 3
At(2) = 14
-
example:
numsep = 5
At(2) = My third
However, the second output should be:
numsep = 3
At(2) = My third,fourth
Do you know the reason for this and how to solve this problem?
Thank you in advance.
Best regards
Christian
Hi Christian,
The function :
TObjArray *TString::Tokenize(const TString &delim) const
{
// This function is used to isolate sequential tokens in a TString.
// These tokens are separated in the string by at least one of the
// characters in delim. The returned array contains the tokens
// as TObjString's. The returned array is the owner of the objects,
// and must be deleted by the user.
uses each character in “delim” as a possible separator .
Eddy
Dear Eddy
Thank you, I misunderstood the method (not looking at the source code).
In principle this is the TString substitute for “strtok()”.
However, I have the problem that sometimes more than one character is used
as separator, another example being " // ", i.e. blank-slash-slash-blank.
Do you or anybody have an idea how to solve this problem (w/o having to
use getc())?
Best regards
Christian
Hi Christian,
It is easy to attack the problem with regular expressions . I know that you do not like regular expressions, but why reinvent string parsing :
TObjArray *GetColumns(const TString &str)
{
TPRegexp r("\"([\\w\\s,]+)\",?");
TObjArray *colL = new TObjArray();
colL->SetOwner();
Int_t start = 0;
while (1) {
TString subStr = str(r,start);
const TString stripStr = subStr.Strip(TString::kTrailing,',');
colL->Add(new TObjString(stripStr));
const Int_t l = subStr.Length();
if (l<=0) break;
start += l;
}
return colL;
}
void bla()
{
TObjArray *col1L = GetColumns("\"12\",\"13\",\"14\",\"15\"");
for (Int_t i = 0; i <col1L>GetLast()+1; i++)
std::cout <<TObjString>At(i))->GetString() << std::endl;
TObjArray *col2L = GetColumns("\"My first\",\"My second\",\"My third,fourth\",\"My seventh, eighth\"");
for (Int_t i = 0; i <col2L>GetLast()+1; i++)
std::cout <<TObjString>At(i))->GetString() << std::endl;
}
Dear Eddy
Thank you for this nice example. I need to understand the regexp, but it works nicely.
P.S.:For the records: std::cout needs to be replaced by:
std::cout <At(i))->GetString() << std::endl;
P.P.S.:
As I see from the preview, for some reason the html formatting destroys the correct line (maybe an regexp artifact?)
Best regards
Christian
Hi Christian,
Oops, the html chewed up my code . I will attach it .
Eddy
bla.C (861 Bytes)
Dear Eddy
Thank you, but I was already able to correct it and wanted to report it in P.S. but html chewed up my code, too
Best regards
Christian