How to quickly convert a large text file into a root file, VC++17, windows10?

Dear Experts
I am going to convert a 200GB text file into a root file, but it takes a very long time, what should I do to shorten the time?

My Code is as follows:

char argv0[][100] = { "D:\\test\\wave_0.txt" };
char argv1[][100] = { "D:\\test\\wave_0.txt" };
char argv2[][100] = { "D:\\test\\wave_1.txt" };
char argv3[][100] = { "D:\\test\\wave_2.txt" };
char argv4[][100] = { "D:\\test\\wave_3.txt" };
char argv5[][100] = { "D:\\test\\wave_4.txt" };
char argv6[][100] = { "D:\\test\\wave_5.txt" };
char argv7[][100] = { "D:\\test\\wave_6.txt" };
char argv8[][100] = { "D:\\test\\wave_7.txt" };
char argv9[][100] = { "D:\\test\\wave_8.txt" };
char argv10[][100] = { "D:\\test\\wave_9.txt" };
char argv11[][100] = { "D:\\test\\wave_10.txt" };
char argv12[][100] = { "D:\\test\\wave_11.txt" };
char argv13[][100] = { "D:\\test\\wave_12.txt" };
char argv14[][100] = { "D:\\test\\wave_13.txt" };
char argv15[][100] = { "D:\\test\\wave_14.txt" };
char argv16[][100] = { "D:\\test\\wave_15.txt" };

char* argv[] = { argv0[0],argv1[0],argv2[0],argv3[0],argv4[0],argv5[0],argv6[0],argv7[0],argv8[0],argv9[0],argv10[0],argv11[0],argv12[0],argv13[0],argv14[0],argv15[0],argv16[0],NULL };
int ch_num = 17;

TFile* outputfile = new TFile("aa.root", "RECREATE");
dataTree = new TTree("dataTree", "Tree with vector");

char str[100];
ifstream* ifs = new ifstream[ch_num];
int* bIDarr = new int[ch_num];
int* chIDarr = new int[ch_num];

for (int i = 1; i < ch_num; i++) {
	string fileName = string(argv[i]);
	ifs[i].open(fileName, std::ifstream::in);
	
	ifs[i].get(str, 100, ':');	
	ifs[i].get(); 
	ifs[i] >> REC_LEN;
	
	ifs[i].get(str, 100, ':');
	ifs[i].get(); 
	ifs[i] >> BOARD_ID;

	ifs[i].get(str, 100, ':');
	ifs[i].get();
	ifs[i] >> CH_ID;
	
	bIDarr[i] = BOARD_ID;
	chIDarr[i] = CH_ID;
	
	TString branchName;
	branchName = Form("B2_Ch%d", CH_ID);
	dataTree->Branch(branchName, vB2_Ch[CH_ID], "vB1_Ch[1][528]/D");////must set dimension 
	
	ifs[i].close();
}


for (int i = 0; i < ch_num; i++) {
	ifs[i].open(string(argv[i]), ifstream::in);
}

int flag_end = 0;//zhangq
int counter_1 = 0, counter = 0;

while (ifs[1].good()) { // loop over all input files

	for (int i = 0; i < CH_NB; i++) {
		for (int j = 0; j < 528; j++) {
			vB1_Ch[i][j] = -99;
			vB2_Ch[i][j] = -99;
		}
	}
	for (int j = 1; j < ch_num; j++) {
		
		vector<double> waveform;
		char str[100]; string str1;
		ifs[j].get(str, 100, ':');
		
		if (strlen(str) == 1) {

			flag_end = 1;//zhangq
			break;
		};

		ifs[j].get(); 
		ifs[j] >> REC_LEN;

		ifs[j].get(str, 100, ':');
		ifs[j].get(); 
		ifs[j] >> BOARD_ID;

		ifs[j].get(str, 100, ':');
		ifs[j].get(); 
		ifs[j] >> CH_ID;

		ifs[j].get(str, 100, ':');
		ifs[j].get(); 
		ifs[j] >> EVT_NB;

		ifs[j].get(str, 100, ':');
		ifs[j].get(); 
		ifs[j] >> str1;
		PATTERN = (unsigned int)stol(str1, nullptr, 0);

		ifs[j].get(str, 100, ':');
		ifs[j].get(); 
		ifs[j] >> TRIG_TIME_STAMP;

		ifs[j].get(str, 100, ':');
		ifs[j].get(); 
		ifs[j] >> str1;
		DC_OFFSET = (unsigned int)stol(str1, nullptr, 0);

		ifs[j].get(str, 100, ':');
		ifs[j].get(); 
		ifs[j] >> START_INDEX_CELL;
		
		for (unsigned int i = 0; i < REC_LEN; i++) {
			double sample;
			ifs[j] >> sample;
			waveform.push_back(sample);
		}

		vB2_Ch[CH_ID][0] = REC_LEN;
		vB2_Ch[CH_ID][1] = BOARD_ID;
		vB2_Ch[CH_ID][2] = CH_ID;
		vB2_Ch[CH_ID][3] = EVT_NB;
		vB2_Ch[CH_ID][4] = PATTERN;
		vB2_Ch[CH_ID][5] = TRIG_TIME_STAMP;
		vB2_Ch[CH_ID][6] = DC_OFFSET;
		vB2_Ch[CH_ID][7] = START_INDEX_CELL;
		for (int i = 0; i < waveform.size(); i++) {
			vB2_Ch[CH_ID][8 + i] = waveform.at(i);
		}
			
		
	}// loop over all input files for one event/waveform

	if (flag_end == 0) {//zhangq
		dataTree->Fill();
		counter_1++;
	}

	if (counter_1 == 500000) {
		cout << counter << endl;
		break;
	}
}// loop over the length of the file

for (int j = 0; j < ch_num; j++) {
	ifs[j].close();
}

outputfile->Write();
outputfile->Close();

Hi @yz_liu,

I do not quite understand why you are using char[][100] to hold your string literals. The suggestion below is simpler and improves readability.

auto argvXX = "D:\\test\\wave_15.txt";
// ...
const char *argv[] = { argv0, argv1, /*...*/ };

As an alternative to your code, you can use RDataFrame to read from the CSV file and dump to a TTree, as in:

auto df = ROOT::RDF::FromCSV("/path/to/input_file.csv");
df.Snapshot("tree", "/path/to/output.root");

but that’s probably not going to be any faster.

On the original question: in general, parsing a plain text file is inherently slow. I’m afraid it cannot be made much faster.

Cheers,
J.

OK,thanks for your reply.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.