What is the best framework for multi-threading in ROOT?

LastStarDust · September 26, 2020, 3:17pm

ROOT Version: latest
Platform: linux
Compiler: whatever

Hello!
I am about to start yet another project with ROOT, and as usual, I would like to use multiple threads to read and write to different TTrees.

It is not my first attempt at multithreading with ROOT and I have been burnt before. In the past, I developed a very nice Python program that dynamically spawned many threads and even chained them one with another. I have spent considerable time and effort to unit test, document, and debug the thing, just to see it fail miserably because of the ROOT’s wacky support for multi-threading. You never know which part of ROOT is thread-safe and which not, leading to random crashes, dead-locks, and so on and so forth.

So this time I do not want to rush things up. I am here to get advice about what is the best way to proceed.

These are my requirements:

Working in Linux
Working on single local machine
C++11 (No Python, no bash scripts, just plain old C++11)
Can use boost
Latest version of ROOT (let’s say from 6.18+)
Each thread reads from and writes to a different TTree (so TTrees are not shared among threads)
The threads are joined at some point because I need to “merge” all the TTrees together. This merging can be done in a single thread, so nothing to worry about.
No histograms, no plotting, no fitting is involved (I have been burnt before by that)

Here are some options on the top of my head:

C++ std::thread
ROOT TThread
ROOT PROOF
pthread
boost.asio

Which one is the most stable? I am not interested in ease of use. I just want my program not to crash. That’s all.

Thank you
Giorgio

joa · September 26, 2020, 8:42pm

Hi,

What about 1 thread, X processes and hadd after, leaving it to the good old kernel… As the threads are not sharing information from the trees why the headache of threads?

Cheers

Joa

LastStarDust · September 27, 2020, 6:08am

The “merging” is a little more complex than hadd (I have to merge information within a single event, not append multiple events). However, your idea is good and to be taken into account.

Nevertheless, it is not without overhead. I should have to develop multiple executables for each step of the analysis, and a Python script that runs them and synchronizes them. This would make the program more brittle and difficult to understand and maintain. Or I should use the fork and wait system calls and shared memory for inter-process communication.

All in all, threads exist for a valid reason. That joining and communication among threads is far easier than among processes.

LastStarDust · September 27, 2020, 6:22am

@joa After some consideration, I really thing that fork/wait system calls are the least evil as you suggest. Thank you. I am not accepting your solution because I really would like to know in general which is the best approach to multithreading in ROOT for the future.

eguiraud · September 27, 2020, 11:29am

Hi Giorgio,
avoiding crashes when using ROOT objects from multiple threads does not depend on the framework you use (e.g. std::thread vs boost.asio) as much as how you use the ROOT objects involved. With the exception of ROOT TThread, which should be considered deprecated, and PROOF, which afaik is multi-process, not multi-thread, other options are fairly equivalent. There is another option besides those you list, which is taking advantage of ROOT’s TThreadExecutor and TTreeProcessorMT to schedule tasks directly on ROOT’s internal thread pool. A large multi-thread framework that uses task-based parallelism with ROOT is for example CMSSW.

In 1995, ROOT was not designed with multi-thread usage in mind; in particular, ROOT keeps track of a lot of global state, think gROOT and its GetListOf* methods: those are all global lists. However, there are common usage patterns that have been made thread-safe specifically to aid the development of frameworks like yours. ROOT itself offers multi-thread TTree processing tools such as RDataFrame and TTreeProcessorMT that take advantage of such guarantees.
To make such common usage patterns safe, you have to call EnableThreadSafety: its docs list what exactly is made thread-safe when calling it.

In particular, for what regards TFile: you can open separate TFiles in separate threads concurrently, and each TFile should only be used and closed from the thread that opened it. Constructing, using and destructing separate TTree and TChain objects in each thread is also safe if ROOT::EnableThreadSafety has been called.

So all requirements look satisfied or satisfiable with ROOT 6.18+. The tricky part will probably be the merging step. Depending on what you mean exactly with “merging”, and how the TTrees are handled, it might or might not be safe. For example, you can use Experimental::TBufferMerger to write to a single output TFile from multiple threads: this is how multi-thread writing of TTrees is implemented in RDataFrame. A small example snippet that shows the usage you have in mind might help us help you further.

Cheers,
Enrico

EDIT:
as @joa says, spawning multiple single-thread processes and then merging their outputs in a final single-thread merging step is a valid strategy to avoid all shared state issues. Depending on your usecase, it might just be the best solution, or it might come with extra runtime, memory usage and output-handling/merging overhead that is undesirable. TProcessExecutor is a facility that lets you easily spawn worker processes via fork.

LastStarDust · September 27, 2020, 2:47pm

Dear Enrico,
thank you very much for the very informative post. It was really helpful and I will surely come back to it in the future when looking for reference.

I knew about EnableThreadSafety and I use it whenever needed. However, as you noted, even doing everything in my power to prevent crashes and deadlocks, well, I do get occasional crashes and deadlocks.

Since you seem very knowledgable about parallelism in ROOT, let me borrow your knowledge for a while. I am describing here in a little more detail what I am trying to achieve and I would like you to kindly tell me what is the best ROOT framework (as I see there are many).

Let us assume that an experiment is made up of many detectors A, B, C. Unfortunately the data from each one is stored in different format fA, fB, fC. Now I would like to convert the formats fA, fB, fC into a new common format called fD. So I would like to spawn 3 threads (or processes) to convert data from A, B and C in parallel. At the end of the conversion of every single detector, I need to merge the temporary files into a global output file with format fD. As I said, the merging can be done in a single thread so let us not worry about it.

As you can see the input and output of the 3 threads are different, so in principle, multithreading should be easy to implement. However in the past, for a similar situation, I noticed that ROOT would crash or deadlock especially when closing the files. If you were in my shoes how would you design this analysis software?

Thank you

eguiraud · September 27, 2020, 3:29pm

ROOT will not crash or deadlock if you open, process and close each file fA, fB and fC in a different thread.

Here’s one way to do it with TThreadExecutor + TFile/TTree:

#include <ROOT/TThreadExecutor.hxx>
#include <TFile.h>
#include <TTree.h>

#include <iostream>
#include <string>
#include <vector>

// Create ROOT files with names `fnames`, each with a TTree called "t"
// with one integer branch called "x" and 3 entries.
void CreateInputFiles(const std::vector<std::string> &fnames) {
   for (const auto &name : fnames) {
      TFile f(name.c_str(), "recreate");
      TTree t("t", "t");
      int x = 42;
      t.Branch("x", &x);
      t.Fill();
      t.Fill();
      t.Fill();
      t.Write();
      f.Close();
   }
}

// Open TTree "t" in file fname, write out its "x" branch as branch "out_x"
// return name of output file.
// This will be each thread's workload.
std::string ProcessOneFile(const std::string &fname) {
   std::cout << fname << " start" << std::endl;
   TFile in_f(fname.c_str());
   auto in_t = in_f.Get<TTree>("t");
   int x = 0;
   in_t->SetBranchAddress("x", &x);
   TFile out_f(("out_" + fname).c_str(), "recreate");
   TTree out_t("out_t", "out_t");
   out_t.Branch("out_x", &x);
   const auto nEntries = in_t->GetEntries();
   for (auto e = 0ll; e < nEntries; ++e) {
      in_t->GetEntry(e);
      out_t.Fill();
   }
   out_t.Write();
   out_f.Close();
   std::cout << fname << " end" << std::endl;

   return "out_" + fname;
}

int main() {
   std::vector<std::string> inputFiles = {"fA.root", "fB.root", "fC.root"};
   CreateInputFiles(inputFiles);

   // Redundant in this case: TThreadExecutor's ctor calls this for you
   //ROOT::EnableThreadSafety();

   ROOT::TThreadExecutor pool;
   auto outputFiles = pool.Map(ProcessOneFile, inputFiles);

   for (const auto &name : outputFiles) {
      std::cout << '\n' << name << " scan" << std::endl;
      TFile f(name.c_str());
      auto t = f.Get<TTree>("out_t");
      t->Scan();
   }

   return 0;
}

Here’s the same but using the ROOT::RDF::RunGraphs feature that was added to ROOT’s master branch last week (i.e. it’s currently unreleased). It lets you run multiple RDataFrame event loops concurrently, but requires a ROOT nightly build:

#include <ROOT/RDataFrame.hxx>
#include <ROOT/RDFHelpers.hxx>
#include <TFile.h>
#include <TTree.h>

#include <iostream>
#include <string>
#include <vector>

void CreateInputFiles(const std::vector<std::string> &fnames) {
   for (const auto &name : fnames) {
      TFile f(name.c_str(), "recreate");
      TTree t("t", "t");
      int x = 42;
      t.Branch("x", &x);
      t.Fill();
      t.Fill();
      t.Fill();
      t.Write();
      f.Close();
   }
}

auto BookProcessingFor(const std::string &fname) {
   ROOT::RDataFrame df("t", fname);
   ROOT::RDF::RSnapshotOptions opts;
   opts.fLazy = true;
   auto resPtr = df.Alias("out_x", "x").Snapshot<int>("out_t", "out_" + fname, {"out_x"}, opts);
   return resPtr;
}

int main() {
   std::vector<std::string> inputFiles = {"fA.root", "fB.root", "fC.root"};
   CreateInputFiles(inputFiles);

   ROOT::EnableImplicitMT();
   std::vector<ROOT::RDF::RResultHandle> results;
   for (const auto &fname : inputFiles)
      results.emplace_back(BookProcessingFor(fname));
   ROOT::RDF::RunGraphs(results);

   for (const auto &name : inputFiles) {
      std::cout << '\n' << name << " scan" << std::endl;
      TFile f(("out_" + name).c_str());
      auto t = f.Get<TTree>("out_t");
      t->Scan();
   }

   return 0;
}

Cheers,
Enrico

LastStarDust · September 27, 2020, 8:26pm

I did not expect such a detailed explanation. So many thanks.

system · October 11, 2020, 8:26pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.