Write to a TTree in parallel

Leonid · February 21, 2017, 3:49pm

Hello,

I currently have classes that perform (a lot of) simulations and then write the results to a TTree. Since it can take a lot of time, I would like to be able to use multiple threads. As each simulation is independent, I am in the “Embarrassingly parallel” situation, where I can make each thread run a certain number of simulations.

My problem is in the data storage. I am a bit lost with what is the best solution. So far, I see two possibilities:

use std::thread facilities from c++11 to run the threads and store each thread in a separate TTree and then use a TChain to analyse the results. Potential problems: if I want to use a particular branch, does the TChain manage that, or do I need “versions” of branches for each thread? Is there a tutorial on how to use a TChain “as a” TTree?
use PROOF. Potential problems: I never used it so I would need to spend a lot of time to learn how to adapt my code to PROOF. (The codebase is actually quite big and complicated). Also, at some point, I would like to be able to become independent from the root shell, and restrict myself to use the root libraries.

What do you think is the best direction to explore, knowing that time in an important factor?

pcanal · February 21, 2017, 7:54pm

Hi,

[quote]use std::thread facilities from c++11 to run the threads and store each thread in a separate TTree and then use a TChain to analyse the results.[/quote]At the moment this is the best solution. (Soon-ish we will also have facility to fast merge the TTree produced in each TThread into a single output file, there is a process based version of this in the tutorials).

[quote]Potential problems: if I want to use a particular branch, does the TChain manage that, or do I need “versions” of branches for each thread? Is there a tutorial on how to use a TChain “as a” TTree? [/quote]In first approximation, beside the initialization, you would use a TChain exactly as a TTree.

Now … I am not sure what you mean by “for each thread”. To be clear, at write time, each of your thread would have its own TTree object and its own TFile object each pointing/creating their own physical file. At read time, you can use a TChain to make all those files to look like a single TTree.

Since you mention ‘branch’ and ‘thread’ in the same sentence, I wonder if you actually meant that each thread would be working on the content of one of the branch of the (conceptual) TTree or if (as I assumed above) each thread would be processing all the branches for different entries.

Cheers,
Philippe.

Leonid · February 24, 2017, 8:13am

Thanks, I managed to do it! I ended up using std::thread and made every thread write to the same TTree. To avoid race conditions, I used a mutex lock.