Hi All,
I have data from 16 channels of detectors that I have been saving as purely binary files. The data in each file/channel is time sorted as it is acquired. We have been wanting to move to a ROOT Tree/File type format believing to have all the data contained in a nice format that is easily readable but the data processing is taking several orders of magnitude longer than simply reading from the binary files directly. I am curious if there is (certainly there is and I have not been smart enough to figure out the documentation) a better way to process the tree that I have created.
I have made a tree with 3 branches (Channel, Energy, Timestamp) called DAQTree. The goal would eventually be to have the daq save the data directly to a tree but for now I have created it from existing data to test the analysis. There are 16 channels (0-15) and while the data for each individual channel is time ordered, all 16 channels are needed to be time ordered to process coincidences using the Timestamp branch. This is the test code that I have at the moment for a tree that contains 100 million entries (700 MB):
uint16_t Energy;
uint64_t Timestamp;
int CH;
sprintf(buf, "%s/%s.root", MAIN_FILE_PATH, ifile);
TFile *FF = TFile::Open(buf, "UPDATE");
TTree *T1 = (TTree *) FF->Get("DAQTree");
T1->SetBranchAddress("Energy", &Energy);
T1->SetBranchAddress("Channel", &CH);
T1->SetBranchAddress("Timestamp", &Timestamp);
//Sort data along Timestamp branch
auto start2 = std::chrono::high_resolution_clock::now(); // try to time how long sorting takes
T1->BuildIndex("Timestamp", "0");
TTreeIndex *I = (TTreeIndex *) T1->GetTreeIndex();
Long64_t *index = I->GetIndex();
auto finish2 = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed2 = finish2 - start2;
cout << "Elapsed time for sorting: " << elapsed2.count() << " s\n";
TH1F *h = new TH1F("h", "h", 16384, 0, 16384);
TH1F *h1 = new TH1F("h1", "h1", 16384, 0, 16384);
uint64_t orig_T, diff_T;
auto entries = T1->GetEntries();
T1->GetEntry(index[0]);
orig_T = Timestamp;
for (auto i = 1; i <= entries; i++) {
T1->GetEntry(index[i]);
//diff_T = Timestamp - orig_T;
if (CH < 8) {
//h1->Fill(Energy); //Singles Spectrum
// if (diff_T <= 1000) {
// h->Fill(Energy); //Coincidence Singles
// } else {
// orig_T = Timestamp;
// }
}
}
h->Write();
h1->Write();
auto finish3 = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed3 = finish3 - start2;
cout << "Elapsed time: " << elapsed3.count() << " s\n";
The first elapsed time for the sorting takes ~36 seconds, which while slower than the sorting I already do on the binary files (using non-ROOT c++) is reasonable. The for-loop over the entries using the new index for the time ordering of events I have not established how long it takes because I have killed the process each time after about 30 minutes.
My other code that simply reads all 16 channels in as binary using standard c++ IO, is able to do the read, sort and fill ~2 dozen TH1F histograms in about 2 minutes. I was hoping for a more fully root centric way to do this analysis making use of the nice Tree structure, I assumed the HEP community had much larger data sets than I do and so this type of looping over a tree would be quite quick or at least better method existed.
If there is something I could do better, have done wrong or should stick to what I have, I would very much appreciate help.
Thank you,
Matthew
_ROOT Version:6.14.06
_Platform:Mac OSX Mojave
_Compiler: gcc (GCC) 7.1.0