Hi,
we are moving away from the original topic of this thread, but just to be 100% clear: calling TTree::Fill
concurrently on the same TTree
object from multiple threads is not safe, and neither is calling TTree::Fill
in one thread while you are calling TTree::AutoSave()
from another. Both operations write to the TTree
internal state. Pretty much no object can be used like that in ROOT 6.
There are a few more things in your snippet that do not look right at first sight (I might be missing something): for example you might never WriteToDisk
the last bunch of entries (in case the previous AsyncFlush
has not terminated yet) and loopCounter
is not atomic but you write to it from one thread while you read it from another.
But actually, if you have access to ROOT v6.12, you don’t have to write multi-threaded code yourself for this task: TDataFrame::Snapshot
is the high-level interface to write columns of a TDataFrame
to a ROOT file in the form of a TTree
, and it can do so by creating the TTree entries and flushing them to disk from multiple threads. It does not fill the tree in one thread and flush it from another (that is not thread-safe), but it creates chunks of TTree entries in each thread, then each thread adds these chunks (i.e. buffers) to a thread-safe writing queue, and the queue is consumed by a single “writer” worker. The end result is an important speed-up in the creation of the data-set, if the actual creation and serialization+compression of the entries is a heavy enough workload (it should be, for all but trivial cases).
The low-level details of concurrent writing of a ROOT file are actually taken care of by TBufferMerger
.
Your use-case would look similar to this (modulo typos and, possibly, optimizations):
#include <ROOT/TDataFrame.hxx>
#include <random>
#include <chrono>
using namespace ROOT::Experimental; // this is where TDF lives
struct DataStructure {
double px;
double py;
double pz;
double random;
int i;
};
void GenerateData() {
ROOT::EnableImplicitMT(); // tell TDF to use multiple threads
TDataFrame d(1000*500); // create a TDF with 1000*500 entries (and no columns yet)
// now we add a column that contains your data structure
// we need a random engine per worker thread
const auto nWorkers = ROOT::IsImplicitMTEnabled() ? ROOT::GetImplicitMTPoolSize() : 1;
std::vector<std::default_random_engine> generators(nWorkers);
std::normal_distribution<double> distribution(0., 2.);
auto makeDataStructure = [&generators, &distribution](unsigned int slot, ULong64_t entry) {
auto &gen = generators[slot]; // each worker thread is guaranteed to receive a different slot number
auto px = distribution(gen);
auto py = distribution(gen);
auto random = distribution(gen);
auto pz = px * px + py * py;
auto i = int(entry);
return DataStructure{px, py, pz, random, i};
};
auto d2 = d.DefineSlotEntry("data", makeDataStructure, {});
// nothing has been executed so far
// this Snapshot call actually triggers the event loop
const auto startTime = std::chrono::high_resolution_clock::now();
d2.Snapshot<DataStructure>("tree", "test.root", {"data"});
const auto endTime = std::chrono::high_resolution_clock::now();
const auto msElapsed = std::chrono::duration_cast<std::chrono::milliseconds>(endTime - startTime).count();
std::cout << "event loop duration: " << msElapsed << std::endl;
}
On my laptop this event loop takes about 4.5s without ROOT::EnableImplicitMT()
and 2.9s with.
EDIT: the first timings were with a debug version of ROOT.
Switching to a release version and rearranging the program so that it’s compiled rather than interpreted, the event loop runs in about .6 seconds with multi-threading and 1 second without.
Increasing the number of entries by a factor 10, we are at 4.2 seconds vs 11 seconds. Not bad, considering my laptop has 2 physical cores (with 2 threads per core).
Hope this helps,
Enrico