So I have a TTree we’ll call it Data_A
and it has a branch named “Time” and there are millions to sometimes billions of entries.
My data analysis is based on the difference in the “Time” values between each entry.
And I want to do it with the difference between 2 different parameters
difference between consecutive time entries
difference between time entries with 1 separation inbetween
The code below is what it looks like currently as a single threaded process.
The problem I’m having is that some Data_A files are simply massive, and can take hours to process.
So I wanted to utilize tools to potentially cut down on this significantly.
I tried looking into multithreading tutorials but none of them really explain well what are the proper steps are for doing a task like this because they work with different variables within the same TTree entry. ROOT::EnableImplicitMT()" does not seem to improve my performance at all. So I want to do explicit multicore/threading.
There are two ways I am currently thinking about approaching this, and pointers on how I can achieve them would be greatly appreciated.
The first is to cut the TTree in segments of 50e6 entries, then process each of the files with a core.
The second is to just divide the Data_A file by “n_workers” number of segments then do the same.
Tips, tricks or suggestions please?
void main(TString fname)
{
TFile *file_0 = new TFile(fname.Data());
ULong64_t time{0};
Data_A -> SetBranchAddress("Time",&time);
TH1D* hdiff0 = new TH1D("diff0","", 1e2,0,1e6);
TH1D* hdiff1 = new TH1D("diff1","", 1e2,0,1e6);
for (Long64_t i = 0; i < Data_A->GetEntries()-1; i++)
{
Data_A->GetEntry(i);
double time0 = time;
double diff = 0;
Data_A->GetEntry(i+1);
diff = time - time0;
hdiff0->Fill(diff);
Data_A->GetEntry(i+2);
diff = time - time0;
hdiff1->Fill(diff);
}
hdiff0 ->Draw();
hdiff1->Draw("same")
}
Please read tips for efficient and successful posting and posting code
ROOT Version: 6.24
Platform: Ubuntu
Compiler: Not Provided