How to quickly convert a large text file into a root file, VC++17, windows10?

Dear Experts
I am going to convert a 200GB text file into a root file, but it takes a very long time, what should I do to shorten the time?

My Code is as follows:

char argv0[][100] = { "D:\\test\\wave_0.txt" };
char argv1[][100] = { "D:\\test\\wave_0.txt" };
char argv2[][100] = { "D:\\test\\wave_1.txt" };
char argv3[][100] = { "D:\\test\\wave_2.txt" };
char argv4[][100] = { "D:\\test\\wave_3.txt" };
char argv5[][100] = { "D:\\test\\wave_4.txt" };
char argv6[][100] = { "D:\\test\\wave_5.txt" };
char argv7[][100] = { "D:\\test\\wave_6.txt" };
char argv8[][100] = { "D:\\test\\wave_7.txt" };
char argv9[][100] = { "D:\\test\\wave_8.txt" };
char argv10[][100] = { "D:\\test\\wave_9.txt" };
char argv11[][100] = { "D:\\test\\wave_10.txt" };
char argv12[][100] = { "D:\\test\\wave_11.txt" };
char argv13[][100] = { "D:\\test\\wave_12.txt" };
char argv14[][100] = { "D:\\test\\wave_13.txt" };
char argv15[][100] = { "D:\\test\\wave_14.txt" };
char argv16[][100] = { "D:\\test\\wave_15.txt" };

char* argv[] = { argv0[0],argv1[0],argv2[0],argv3[0],argv4[0],argv5[0],argv6[0],argv7[0],argv8[0],argv9[0],argv10[0],argv11[0],argv12[0],argv13[0],argv14[0],argv15[0],argv16[0],NULL };
int ch_num = 17;

TFile* outputfile = new TFile("aa.root", "RECREATE");
dataTree = new TTree("dataTree", "Tree with vector");

char str[100];
ifstream* ifs = new ifstream[ch_num];
int* bIDarr = new int[ch_num];
int* chIDarr = new int[ch_num];

for (int i = 1; i < ch_num; i++) {
	string fileName = string(argv[i]);
	ifs[i].open(fileName, std::ifstream::in);
	
	ifs[i].get(str, 100, ':');	
	ifs[i].get(); 
	ifs[i] >> REC_LEN;
	
	ifs[i].get(str, 100, ':');
	ifs[i].get(); 
	ifs[i] >> BOARD_ID;

	ifs[i].get(str, 100, ':');
	ifs[i].get();
	ifs[i] >> CH_ID;
	
	bIDarr[i] = BOARD_ID;
	chIDarr[i] = CH_ID;
	
	TString branchName;
	branchName = Form("B2_Ch%d", CH_ID);
	dataTree->Branch(branchName, vB2_Ch[CH_ID], "vB1_Ch[1][528]/D");////must set dimension 
	
	ifs[i].close();
}


for (int i = 0; i < ch_num; i++) {
	ifs[i].open(string(argv[i]), ifstream::in);
}

int flag_end = 0;//zhangq
int counter_1 = 0, counter = 0;

while (ifs[1].good()) { // loop over all input files

	for (int i = 0; i < CH_NB; i++) {
		for (int j = 0; j < 528; j++) {
			vB1_Ch[i][j] = -99;
			vB2_Ch[i][j] = -99;
		}
	}
	for (int j = 1; j < ch_num; j++) {
		
		vector<double> waveform;
		char str[100]; string str1;
		ifs[j].get(str, 100, ':');
		
		if (strlen(str) == 1) {

			flag_end = 1;//zhangq
			break;
		};

		ifs[j].get(); 
		ifs[j] >> REC_LEN;

		ifs[j].get(str, 100, ':');
		ifs[j].get(); 
		ifs[j] >> BOARD_ID;

		ifs[j].get(str, 100, ':');
		ifs[j].get(); 
		ifs[j] >> CH_ID;

		ifs[j].get(str, 100, ':');
		ifs[j].get(); 
		ifs[j] >> EVT_NB;

		ifs[j].get(str, 100, ':');
		ifs[j].get(); 
		ifs[j] >> str1;
		PATTERN = (unsigned int)stol(str1, nullptr, 0);

		ifs[j].get(str, 100, ':');
		ifs[j].get(); 
		ifs[j] >> TRIG_TIME_STAMP;

		ifs[j].get(str, 100, ':');
		ifs[j].get(); 
		ifs[j] >> str1;
		DC_OFFSET = (unsigned int)stol(str1, nullptr, 0);

		ifs[j].get(str, 100, ':');
		ifs[j].get(); 
		ifs[j] >> START_INDEX_CELL;
		
		for (unsigned int i = 0; i < REC_LEN; i++) {
			double sample;
			ifs[j] >> sample;
			waveform.push_back(sample);
		}

		vB2_Ch[CH_ID][0] = REC_LEN;
		vB2_Ch[CH_ID][1] = BOARD_ID;
		vB2_Ch[CH_ID][2] = CH_ID;
		vB2_Ch[CH_ID][3] = EVT_NB;
		vB2_Ch[CH_ID][4] = PATTERN;
		vB2_Ch[CH_ID][5] = TRIG_TIME_STAMP;
		vB2_Ch[CH_ID][6] = DC_OFFSET;
		vB2_Ch[CH_ID][7] = START_INDEX_CELL;
		for (int i = 0; i < waveform.size(); i++) {
			vB2_Ch[CH_ID][8 + i] = waveform.at(i);
		}
			
		
	}// loop over all input files for one event/waveform

	if (flag_end == 0) {//zhangq
		dataTree->Fill();
		counter_1++;
	}

	if (counter_1 == 500000) {
		cout << counter << endl;
		break;
	}
}// loop over the length of the file

for (int j = 0; j < ch_num; j++) {
	ifs[j].close();
}

outputfile->Write();
outputfile->Close();

Hi @yz_liu,

I do not quite understand why you are using char[][100] to hold your string literals. The suggestion below is simpler and improves readability.

auto argvXX = "D:\\test\\wave_15.txt";
// ...
const char *argv[] = { argv0, argv1, /*...*/ };

As an alternative to your code, you can use RDataFrame to read from the CSV file and dump to a TTree, as in:

auto df = ROOT::RDF::FromCSV("/path/to/input_file.csv");
df.Snapshot("tree", "/path/to/output.root");

but that’s probably not going to be any faster.

On the original question: in general, parsing a plain text file is inherently slow. I’m afraid it cannot be made much faster.

Cheers,
J.

OK,thanks for your reply.