I am currently playing around with writing code to parallelize my analysis of TTrees. Generally I start with a single TTree in a single file, and my goal is to do the following:
- Loop over all events in the tree and extract data from the branches
- Do some (perhaps complex and time-consuming) processing of the data
- Write the results into a new TTree
I have been playing around with some of the very nice newer parallelization features in ROOT6 and would like to make use of these if possible. I have created some scripts that use TTreeProcessorMT and TBufferMerger to do this sort of thing in parallel. Here is a simplified example:
void ParallelProcess(TTree* t){
int nthreads = 8;
ROOT::EnableImplicitMT(nthreads);
ROOT::Experimental::TBufferMerger merger("output.root");
ROOT::TTreeProcessorMT ttp(*t);
auto myFunction = [&](TTreeReader& reader) {
TTreeReaderValue<double> rx (reader, "x"); // assume "x" is a branch in t
auto f = merger.GetFile();
TTree tout("tout","");
tout.ResetBit(kMustCleanup);
double x2;
tout.Branch("x2",&x2,"x2/D");
while(reader.Next()){
x2 = pow(*rx,2);
tout.Fill();
}
f->Write();
};
ttp.Process(myFunction);
}
This all works fine, however using TBufferMerger causes the order of events in the output TTree to be different from the input. This prevents me from easily correlating input and output parameters in future analysis, for example using TTree::AddFriend() and TTree::Draw(). I have also tried avoiding TBufferMerger and instead creating one thread-local TFile + TTree per thread, and then chaining them together afterwards. However, again the event order is not preserved in the TChain.
My main question is: is there any way to use TTreeProcessorMT (or a similar “new” ROOT6 parallelization feature) in such a way that preserved event ordering in the output TTree?
Of course, I could accomplish this by creating my own threads explicitly and manually divvying up the entries in the input tree between the threads. But if there’s a way to do this using the newer, implicit features I would be interested to hear about it.
As an aside, I have noticed that sometimes TTreeProcessorMT seems to use serial processing, while other times is processes in parallel. It seems to correlate with the size of the entries in the initial Tree. Is this a “feature”, i.e. is TTreeProcessorMT smart enough to figure out if parallel processing is beneficial or not? Or am I missing something?
ROOT Version: 6.12
Platform: Ubuntu 16.04 linux
Compiler: g++ 5.4.0