Hi all, I’m using ROOT v5.34/24 on Ubuntu 14.04.1 LTS (and also Mac OSX 10.10.5) to do some post-processing of large raw data files. Specifically, I have a bunch of list-mode LYSO scintillator data stored as PulseArea TBranches of TTrees called WaveformTree in about 50 different files, some of which are gzipped and some of which are not. I would like to extract the PulseArea data, fill a TH1F with it (for each file), and write the collection of TH1Fs to a single root file.
Here is my entire analysis script:
// script to data-reduce list-mode LYSO root files to histogram-mode
// based off of https://root-forum.cern.ch/t/open-files-in-a-directory-with-a-for-loop/12471/1
// Jayson Vavrek, MIT, 2016
void reduceLYSO(const char *dirname="./")
{
TStopwatch sw;
sw.Start();
// these are the two file extensions for LYSO data
const char *ext=".root";
const char *extgz=".root.gz";
// initialize the output file
TFile *outfile = new TFile("reduced_LYSO_data.root","RECREATE");
// loop over all the files
int fileCounter = 0;
TSystemDirectory dir(dirname, dirname);
TList *files = dir.GetListOfFiles();
if (files) {
TSystemFile *file;
Long64_t totalFileSize = 0;
TCanvas *c1 = new TCanvas();
cout << "Processing files..." << endl;
TIter next(files);
while ((file=(TSystemFile*)next()))
{
TString fname = file->GetName();
bool wasZipped = false;
if (!file->IsDirectory() && fname.BeginsWith("2016") )
{
// if the file is gzip'd, gunzip it and modify the fname var
if (fname.EndsWith(extgz))
{
cout << gSystem->Exec("yes n | gunzip " + fname) << endl; // pass "n" in case it asks to overwrite
fname.ReplaceAll(".gz","");
cout << fname << " " << fname.Length() << endl;
wasZipped = true;
}
// now do the heavy lifting
if (fname.EndsWith(ext))
{
TFile *f = (TFile*) TFile::Open(fname.Data());
TTree *tree = (TTree*) f->Get("WaveformTree");
totalFileSize += f->GetSize();
TString hname = fname;
hname.ReplaceAll(".root","");
hname.Prepend("h_");
cout << " " << fileCounter << ") " << hname << endl;
TH1F *h = new TH1F("h","h",30000, 0, 30000);
tree->Draw("PulseArea>>h","","");
h->SetName(hname);
outfile->cd();
h->Write();
}
// if the original file was zipped, rezip it
if (wasZipped) gSystem->Exec("gzip " + fname);
++fileCounter;
}
}
}
outfile->Close();
cout << "File " << outfile->GetName() << " created." << endl;
cout << "Approximately " << totalFileSize/1.0e9 << " Gbytes of list-mode data reduced to "
<< outfile->GetSize()/1.0e3 << " kbytes of histogram-mode data." << endl;
cout << "Real time spent: " << sw.RealTime()/60.0 << " min." << endl;
}
After processing about nine files, (some of which are gzipped and some of which are not), my script fails. Specifically, I get an error:
Error in <TFile::TFile>: file 2016_08_02_14_23_13.root does not exist
This occurs when I gunzip the file then call TFile::Open() on the file name minus the “.gz” extension.
The weird thing is that doing the gunzip (and other lines) manually in the shell or the interpreter works fine. It also works fine if I only have a few files to loop over. It’s when I have 50 or so files in the directory to process—then it fails on the ninth file. Moreover, if I comment out the tree->Draw() command, it churns through all the files no problem.
I should also mention that it initially fails on the file that was created last chronologically. I’m not sure why, but the files in the TList have no discernible order, such that the chronologically-last file gets put ninth in the TList. Due to the seemingly random order, the script processes a few gzipped and a few non-gzipped files before failing on the ninth one, so it’s not running into a problem on its first encounter with a gzipped file, for instance. If I move this last file out of the directory, the script then proceeds to fail on the 10th-last chronological file (11th from the start of the TList), and not the second last.
Any ideas?