Std::getline(fstream, string) become much slow by using Cling of ROOT6.14.00

Hi, experts!
Recently, I have found that std::getline(fstream, string) used in Cling of root6.14.00 become much slower than root6.13.02.
Here is my codes for testing:

 #include <iostream>
 #include <fstream>
 #include <string>
 #include <sstream>
 using namespace std;

void csvfileGetline()
{
string filepath = "/Users/yichunzh/DataAnalysis/All_Site_ChiST/";

string csvFileName = filepath + "All-2018-05_UID.csv";

ifstream fp;
fp.open(csvFileName.c_str(), ios::in|ios::binary);
if (!fp)
cout<< "Problem when opening the file: "<< csvFileName<< endl;

//variables to get data from csv file
string line;

int entries = 0;
while(!fp.eof() && fp.peek() != EOF)
{
line.clear();

getline(fp, line);

if (0 == entries%1000000)
    cout<< "******** "<< entries<< " Currenttime: "<< TDatime().AsSQLString()<< " ********"<< endl;

entries++;
}

cout<< "Total "<< entries<< " events input!"<< endl;
fp.close();
} 

Here are the test results:
ROOT6.13.02: It costs only 3 or 4 seconds to read in one million lines

  ------------------------------------------------------------
  | Welcome to ROOT 6.13/02                http://root.cern.ch |
  |                               (c) 1995-2017, The ROOT Team |
  | Built for macosx64                                         |
  | From tag v6-13-02, 20 March 2018                           |
  | Try '.help', '.demo', '.license', '.credits', '.quit'/'.q' |
   ------------------------------------------------------------

root [0] .x csvfileGetline.cpp
******** 0 Currenttime: 2018-06-21 13:28:53 ********
******** 1000000 Currenttime: 2018-06-21 13:28:57 ********
******** 2000000 Currenttime: 2018-06-21 13:29:01 ********
******** 3000000 Currenttime: 2018-06-21 13:29:04 ********
******** 4000000 Currenttime: 2018-06-21 13:29:08 ********
******** 5000000 Currenttime: 2018-06-21 13:29:12 ********
******** 6000000 Currenttime: 2018-06-21 13:29:15 ********
******** 7000000 Currenttime: 2018-06-21 13:29:19 ********
******** 8000000 Currenttime: 2018-06-21 13:29:22 ********
******** 9000000 Currenttime: 2018-06-21 13:29:26 ********
******** 10000000 Currenttime: 2018-06-21 13:29:30 ********
******** 11000000 Currenttime: 2018-06-21 13:29:33 ********
******** 12000000 Currenttime: 2018-06-21 13:29:37 ********

ROOT6.14.00: It costs more than 2 minutes to read in one million lines

 ------------------------------------------------------------
 | Welcome to ROOT 6.14/00                http://root.cern.ch |
 |                               (c) 1995-2018, The ROOT Team |
 | Built for macosx64                                         |
 | From tag v6-14-00, 13 June 2018                            |
 | Try '.help', '.demo', '.license', '.credits', '.quit'/'.q' |
 ------------------------------------------------------------

 root [0] .x csvfileGetline.cpp
 ******** 0 Currenttime: 2018-06-21 13:36:35 ********
 ******** 1000000 Currenttime: 2018-06-21 13:39:00 ********
 ******** 2000000 Currenttime: 2018-06-21 13:41:23 ********
 ******** 3000000 Currenttime: 2018-06-21 13:43:41 ********
 ******** 4000000 Currenttime: 2018-06-21 13:45:59 ********
 ******** 5000000 Currenttime: 2018-06-21 13:48:22 ********
 ******** 6000000 Currenttime: 2018-06-21 13:50:45 ********
 ******** 7000000 Currenttime: 2018-06-21 13:53:12 ********
 ******** 8000000 Currenttime: 2018-06-21 13:55:32 ********
 ******** 9000000 Currenttime: 2018-06-21 13:57:51 ********
 ******** 10000000 Currenttime: 2018-06-21 14:00:13 ********
 ******** 11000000 Currenttime: 2018-06-21 14:02:35 ********
 ******** 12000000 Currenttime: 2018-06-21 14:04:56 ********

For both root6.14 and root6.13.02, I have downloaded the source codes and compile them on my desktop by adding “-Dgnuinstall=ON -Dcxx17=ON” keeping the others default which gave the informations:

– Compiler Flags: -Wc++11-narrowing -Wsign-compare -Wsometimes-uninitialized -Wconditional-uninitialized -Wheader-guard -Warray-bounds -Wcomment -Wtautological-compare -Wstrncat-size -Wloop-analysis -Wbool-conversion -m64 -pipe -W -Wshadow -Wall -Woverloaded-virtual -fsigned-char -fno-common -Qunused-arguments -pthread -std=c++1z -stdlib=libc++ -DR__HAVE_CONFIG

And here is the information of my compiler:
Apple LLVM version 9.1.0 (clang-902.0.39.1)
Target: x86_64-apple-darwin17.5.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

I have also tested the codes by writing main() function and use “clang++ -std=c++1z -stdlib=libc++ -o readFileStream readFileStream.cpp”. It is also as fast as root6.13.02. So, I wonder if there are internal changes on root6.14.00 to make the getline() slow? Please let me know if anyone needs more information to debug the issue. Thanks!

I guess @axel may have an idea about this.

:grinning:Thank you for your suggestion! @Axel Could you kindly give any help? If more information or tests are needed, please let me know

By the way, we cannot run your macro because of a missing data file. Can you provide a macro we can run ?

Sure. I will upload the .csv file after removing sensitive information.

Sorry for replying so late. I tried many times but failed to upload the data.
Could you please download the data from https://1drv.ms/u/s!AhXLnecVdLyHadp49OC1aZY5E1Q Thank you!
If there exists problem, please let me know

I am not sure I fully understand what’s going on. But I just tried with 6.12, and 6.14 on my Mac, and it take long to run. But with 6.15 (the master) it is very quick:

$ root csvfileGetline.cpp
   -------------------------------------------------------------------
  | Welcome to ROOT 6.15/01                       http://root.cern.ch |
  |                                      (c) 1995-2018, The ROOT Team |
  | Built for macosx64                                                |
  | From heads/master@v6-13-04-437-g13855fbe5b, Jun 20 2018, 14:50:35 |
  | Try '.help', '.demo', '.license', '.credits', '.quit'/'.q'        |
   -------------------------------------------------------------------

root [0] 
Processing csvfileGetline.cpp...
******** 0 Currenttime: 2018-06-21 11:15:48 ********
******** 1000000 Currenttime: 2018-06-21 11:15:50 ********
******** 2000000 Currenttime: 2018-06-21 11:15:52 ********
******** 3000000 Currenttime: 2018-06-21 11:15:54 ********
******** 4000000 Currenttime: 2018-06-21 11:15:55 ********
******** 5000000 Currenttime: 2018-06-21 11:15:57 ********
******** 6000000 Currenttime: 2018-06-21 11:15:59 ********
******** 7000000 Currenttime: 2018-06-21 11:16:01 ********
Total 8000000 events input!

As I said I am not an expert of that part of ROOT.
To you see the same thing with 6.15 ? it is strange because 6.15 and 6.14 are very close as we just did the release.

Sorry, I just down load the source codes from https://root.cern.ch/releases. So the latest to me is 6.14.00. I tried 6.13.02, 6.13.08 and 6.14.00.:sweat_smile:

Yes I was talking about the head/master version. You can take from GitHub:
https://github.com/root-project/root . But as i said I am not expert in that domain. I just tried a few things and told you what I saw.

I cannot say anything about the speed issue, however while(!fp.eof() ... is an anti-pattern in C++! Don’t do it this way!

Instead check the result of getline:

ifstream fp(csvFileName.c_str(), ios::in|ios::binary); 
// use the constructor instead of a separate "open"
// also, why "binary" when you are reading "lines" from a text file?
// And no need for .c_str()!
// so I assume you should simplify to:
//     ifstream fp(csvFileName);

(...)

// Your while loop should look like this:
while (getline(fp, line)) {
  if (0 == entries % 1000000)
    cout<< "******** "<< entries<< " Currenttime: "<< TDatime().AsSQLString()<< " ********"<< endl;

  entries++;
}

Could you check speed differences with this while loop?

The loop is shorter and there is no improvement in speed (6.12 and 6.14 still very slow):

$ root csvfileGetline.cpp
   -------------------------------------------------------------------
  | Welcome to ROOT 6.15/01                       http://root.cern.ch |
  |                                      (c) 1995-2018, The ROOT Team |
  | Built for macosx64                                                |
  | From heads/master@v6-13-04-437-g13855fbe5b, Jun 20 2018, 14:50:35 |
  | Try '.help', '.demo', '.license', '.credits', '.quit'/'.q'        |
   -------------------------------------------------------------------

root [0] 
Processing csvfileGetline.cpp...
******** 0 Currenttime: 2018-06-21 12:45:18 ********
******** 1000000 Currenttime: 2018-06-21 12:45:22 ********
******** 2000000 Currenttime: 2018-06-21 12:45:26 ********
******** 3000000 Currenttime: 2018-06-21 12:45:30 ********
Total 4000000 events input!
root [1] 

Thank you for your suggestion on codes. I have modified the .cpp file.
While the performance issue still exists.

   ------------------------------------------------------------
   | Welcome to ROOT 6.13/02                http://root.cern.ch |
   |                               (c) 1995-2017, The ROOT Team |
   | Built for macosx64                                         |
   | From tag v6-13-02, 20 March 2018                           |
   | Try '.help', '.demo', '.license', '.credits', '.quit'/'.q' |
   ------------------------------------------------------------

 root [0] .x csvfileGetline.cpp
 ******** 0 Currenttime: 2018-06-21 20:46:13 ********
 ******** 1000000 Currenttime: 2018-06-21 20:46:17 ********
 ******** 2000000 Currenttime: 2018-06-21 20:46:21 ********
 ******** 3000000 Currenttime: 2018-06-21 20:46:24 ********
 ******** 4000000 Currenttime: 2018-06-21 20:46:28 ********
 ******** 5000000 Currenttime: 2018-06-21 20:46:32 ********
 ******** 6000000 Currenttime: 2018-06-21 20:46:35 ********
 ******** 7000000 Currenttime: 2018-06-21 20:46:39 ********
 ******** 8000000 Currenttime: 2018-06-21 20:46:43 ********
 ******** 9000000 Currenttime: 2018-06-21 20:46:46 ********

ROOT6.13.08

  ------------------------------------------------------------
  | Welcome to ROOT 6.13/08                http://root.cern.ch |
  |                               (c) 1995-2018, The ROOT Team |
  | Built for macosx64                                         |
  | From tag v6-13-08, 15 May 2018                             |
  | Try '.help', '.demo', '.license', '.credits', '.quit'/'.q' |
  ------------------------------------------------------------

 root [0] .x csvfileGetline.cpp
 ******** 0 Currenttime: 2018-06-21 20:47:24 ********
 ******** 1000000 Currenttime: 2018-06-21 20:49:48 ********
 ******** 2000000 Currenttime: 2018-06-21 20:52:09 ********

I will download the 6.15 and have a try. Thank you for what you’ve done!:grinning:

To my surprise. ROOT6.15.01 is very slow for running the script.

   ------------------------------------------------------------
  | Welcome to ROOT 6.15/01                http://root.cern.ch |
  |                               (c) 1995-2018, The ROOT Team |
  | Built for macosx64                                         |
  | From tag , 8 May 2018                                      |
  | Try '.help', '.demo', '.license', '.credits', '.quit'/'.q' |
  ------------------------------------------------------------

  root [0] .x csvfileGetline.cpp
  ******** 0 Currenttime: 2018-06-21 22:13:58 ********
  ******** 1000000 Currenttime: 2018-06-21 22:16:25 ********
  ******** 2000000 Currenttime: 2018-06-21 22:18:50 ********
  ******** 3000000 Currenttime: 2018-06-21 22:21:15 ********
  ******** 4000000 Currenttime: 2018-06-21 22:23:39 ********
  ******** 5000000 Currenttime: 2018-06-21 22:26:02 ********

I used ‘cmake …/root-master -DCMAKE_INSTALL_PREFIX=/Users/yichunzh/Applications/ROOT61501 -Dgnuinstall=ON -Dcxx17=ON’ before building the codes.
And the compile information is as below:
– Compiler Flags: -Wc++11-narrowing -Wsign-compare -Wsometimes-uninitialized -Wconditional-uninitialized -Wheader-guard -Warray-bounds -Wcomment -Wtautological-compare -Wstrncat-size -Wloop-analysis -Wbool-conversion -m64 -pipe -W -Wshadow -Wall -Woverloaded-virtual -fsigned-char -fno-common -Qunused-arguments -pthread -std=c++1z -stdlib=libc++ -DR__HAVE_CONFIG

How did you compile the codes to get high performance? I do not know if it is because “incorrect” option is used.

almost the default. I used the following cmake options:

-Dcxx14=ON  -Dall=ON

Actually, I have tried cxx11, cxx14 and cxx17. All of them are very slow. I will try the -Dall. Hope it works

Actually I thought these to flag where enough to explain de good speed … but not… cmake with them give me a slow speed . I am not trying to find which flag makes it fast. I will tell you (I am using more flags indeed)

Did you mean that you have found the reason?
Just now, I used these two options but the speed was still very slow.:joy:

I mean that the flag I sent you where not the only one I used to have the fats result. I am trying incorporate the other flags to understand which one gave me the fast result. It is a bit long because I have to rebuild root each time. I will let you know,

OK. Thank you again for your help!