Getting TTree clusters as used by TTreeProcessorMT

taehyounpark · November 24, 2021, 8:07pm

Dear experts,

I am interested in parallelizing my analysis workflow. My analysis code uses a custom class that whose variables, cuts & weights, and histograms that can be configured in a similar way to RDataFrame, so that running it inside an event loop can be done by a trivial method call, something like:

class Analysis
{
template <typename T> 
Variable<T>* read( /* ... */ );  // read a branch of type T from ttree
template <typename Tcustom> 
Variable<Tcustom>* define( /* ... */  );  // define a new variable from existing branches
template <typename F> 
Analysis& cut(F callable, /* ... */); // can apply cuts in sequence.
// ....
};

int main() {
  auto tree = /* ... */;
  // in a PyROOT interface, can be configured on-the-fly
  auto ana = Analysis();
  ana.input(&tree);
  ana.read<float>("branch1");
  ana.define<MyVariable>("branch1");
  ana./*....*/;
  // running over a TTree is "trivial" at this stage with all actions booked
  ana.initialize();
  for auto (int ientry=0 ; ientry<tree->GetEntries() ; ++ientry) {
    tree.GetEntry(ientry);
    ana.execute();
  }
  ana.finalize();
  return 0;
}

Since I can create independent Analysis instances as many times as I want, I can currently simply set ROOT::EnableThreadSafety and use regular std::thread to implement multithreaded analysis runs without using TThreadedObject of e.g. histograms inside my analysis class. In pure C++, parallelizing my code using TTreeProcessorMT would look like:

int main() {
  auto treeProcessor = /* ... */;
  treeProcessor.Process([](TTreeReader& subRange) {
    // instantiate an Analysis() for each subrange
    ana = Analysis();
    ana.read<float>(/*...*/);
    ana.define<MyVariable>(/*...*/);
    ana.input(&subRange); // <- currently using TTree, will have to use TTreeReader like here instead
    // ...
    while(subRange.next()) {
      ana.execute();
    });
    return 0;
}

However, since I perform configuration of an Analysis instance to be via external config files and PyROOT interface, I cannot use TTreeProcessorMT directly (as that would mean I need to statically compile the said instance). So instead, I currently have the limitation that my concurrency is bounded by the number of input file/trees that that my ntuples are stored over. I was hoping/wondering how easy it was to access clusters of a TTree corresponding to subranges of a in a thread-safe way as implemented in TTreeProcessorMT?

Edit: I just wanted to add that for the ntuples that I am actually using, all the trees only have one cluster so I am (in theory) getting all the parallel performance I can get by simply running over N files in parallel. However, I am still interested in this for potentially having a more robust code setup where I would always be merging input files in advance (with control over granularity of clusters) and run over its clusters in parallel.

Please read tips for efficient and successful posting and posting code

ROOT Version: 6.20.04

couet · November 25, 2021, 7:40am

I guess @pcanal can help.

eguiraud · November 26, 2021, 8:02am

Hi @taehyounpark ,
you can access the cluster boundaries via TTree::GetClusterIterator() and iterating on that. TTreeProcessorMT has extra logic to decide how many clusters to process in the same multi-thread task, and to deal with friend trees or the presence of a TEntryList.

If you add a void operator()(TTreeReader &r) to Analysis you can also directly pass an Analysis object to TTreeProcessorMT::Process.

RDataFrame already solves a lot of these problems for you, maybe you can refactor things so that Analysis builds on top of RDF.

Cheers,
Enrico

system · December 10, 2021, 8:02am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.