Large loop on txt files

Hi,
I am trying to loop over txt files in directory and write them into root files (one root files can include several txt files when events of same specs).
While my code is working it takes a huge time to compile when executed through “root -l txttoroot.C” (I get “Processing txttoroot.C…”)
Here is my current codetxttoroot.C (1.9 KB)
I think it should come from the allocation when creating the histograms, I might not understand well if I am creating memory loss through wrong free. I think closing the files written was enough.

Could I get advice on this? Even a better way to do this .txt to .root transformation.

Thanks
Florian


_ROOT Version:6.20/04
_Platform: Centos
_Compiler: gcc version 4.8.5 20150623 (Red Hat 4.8.5-39)


Instead of data->Close(); use delete data; // automatically deletes hist_L, too

Thanks for this advice.
It seems it still takes a huge time to start pass the “Processing” stage (Is this message marking the compilation?)

I am processing 10,000 files of 1.3MB. I tried to do it with less files and it seems to be faster to compile. I don’t understand why this compilation should be that long when it is just looping over files in a folder.

Hi @florian_Gautier,
the time difference is definitely not due to a difference in compile/interpret times. It might have to do with Slow down reading many TFiles (@pcanal might be able to comment).

You can check where the program spends time e.g. with perf record and perf report, or with the poor man’s profiler, which works very well when there is a single callstack that occupies most of the time, as it is probably the case here.

Cheers,
Enrico

Thanks, I will try to run this perf commands and provide the output.

A difference I noted compared to the other thread you mention is that I don’t see a slowdown of my process during the execution but immediately. As long as too many files are present, then the starting of process is really long (My code process should output the current file being read each loop)

Can you try to modify the beginning of your macro and observe how it behaves (i.e. how fast it prints the “debug output”):

void txttoroot()
{
  std::cout << "txttoroot ..." << std::endl << std::flush;
  const char *dirname="/media/sf_Partag_virtualbox/Ampli/Test";
  std::cout << "... pwd" << std::endl << std::flush;
  TString pwd(gSystem->pwd()); //Get a current directory
  std::cout << "... dir" << std::endl << std::flush;
  TSystemDirectory dir(dirname, dirname);
  std::cout << "... files" << std::endl << std::flush;
  TList *files = dir.GetListOfFiles();
  std::cout << "files ..." << std::endl << std::flush;
  // ...
    if (files)
    {
     std::cout << "... next" << std::endl << std::flush;
     TSystemFile *file; TString fname,fname2; TIter next(files);
     std::cout << "next ..." << std::endl << std::flush;
     // ...

[flo@Scientific AmplifiedL]$ root -l txttoroot.C
root [0]
Processing txttoroot.C…
txttoroot …
… pwd
… dir
… files
files …
… next
next …
out_config300_ion_14_28_14_5MeV_nucl_0_deg_10000_8_20200529_210423.txt

@Wile_E_Coyote, I implemented your debug output and the huge delay happens after Processing txttoroot.C... and before txttoroot ...

Next, I will try to look at how perf report works and try to apply it.

So the huge delay seems to be unrelated to your macro (it happens before its first line is executed).
Just to make sure … can you try two command lines:
root -b -n -l txttoroot.C
root.exe -b -n -l txttoroot.C

It did not change time the delay.