Dear experts,
I am interested in parallelizing my analysis workflow. My analysis code uses a custom class that whose variables, cuts & weights, and histograms that can be configured in a similar way to RDataFrame
, so that running it inside an event loop can be done by a trivial method call, something like:
class Analysis
{
template <typename T>
Variable<T>* read( /* ... */ ); // read a branch of type T from ttree
template <typename Tcustom>
Variable<Tcustom>* define( /* ... */ ); // define a new variable from existing branches
template <typename F>
Analysis& cut(F callable, /* ... */); // can apply cuts in sequence.
// ....
};
int main() {
auto tree = /* ... */;
// in a PyROOT interface, can be configured on-the-fly
auto ana = Analysis();
ana.input(&tree);
ana.read<float>("branch1");
ana.define<MyVariable>("branch1");
ana./*....*/;
// running over a TTree is "trivial" at this stage with all actions booked
ana.initialize();
for auto (int ientry=0 ; ientry<tree->GetEntries() ; ++ientry) {
tree.GetEntry(ientry);
ana.execute();
}
ana.finalize();
return 0;
}
Since I can create independent Analysis
instances as many times as I want, I can currently simply set ROOT::EnableThreadSafety
and use regular std::thread
to implement multithreaded analysis runs without using TThreadedObject
of e.g. histograms inside my analysis class. In pure C++, parallelizing my code using TTreeProcessorMT
would look like:
int main() {
auto treeProcessor = /* ... */;
treeProcessor.Process([](TTreeReader& subRange) {
// instantiate an Analysis() for each subrange
ana = Analysis();
ana.read<float>(/*...*/);
ana.define<MyVariable>(/*...*/);
ana.input(&subRange); // <- currently using TTree, will have to use TTreeReader like here instead
// ...
while(subRange.next()) {
ana.execute();
});
return 0;
}
However, since I perform configuration of an Analysis
instance to be via external config files and PyROOT interface, I cannot use TTreeProcessorMT
directly (as that would mean I need to statically compile the said instance). So instead, I currently have the limitation that my concurrency is bounded by the number of input file/trees that that my ntuples are stored over. I was hoping/wondering how easy it was to access clusters of a TTree corresponding to subranges of a in a thread-safe way as implemented in TTreeProcessorMT
?
Edit: I just wanted to add that for the ntuples that I am actually using, all the trees only have one cluster so I am (in theory) getting all the parallel performance I can get by simply running over N files in parallel. However, I am still interested in this for potentially having a more robust code setup where I would always be merging input files in advance (with control over granularity of clusters) and run over its clusters in parallel.
Please read tips for efficient and successful posting and posting code
ROOT Version: 6.20.04