TTreeProcessorMP: Getting data out of func

Hi experts,

I am trying to use the parrallel TTree-Reader/Processor.
This is the code I am using:

class MergeVec
{
 private:
  std::vector<o2::trd::TrackQC> tracks{};
  std::mutex mutex{};
  unsigned int id{0};
  static MergeVec* instance;

  MergeVec() {}

 public:
  void Merge(const std::vector<o2::trd::TrackQC> vec)
  {
    const std::lock_guard<std::mutex> lock(mutex);
    auto before = tracks.size();
    tracks.reserve(tracks.size() + std::distance(vec.begin(), vec.end()));
    tracks.insert(tracks.end(), vec.begin(), vec.end());
    auto after = tracks.size();
    std::cout << "ID " << id << " from " << before << " to " << after << std::endl;
    ++id;
  }

  static MergeVec* getInstance()
  {
    if (!instance)
      instance = new MergeVec;
    return instance;
  }

  std::vector<o2::trd::TrackQC> getData()
  {
    return this->tracks;
  }
};

MergeVec* MergeVec::instance = 0;

void dEdxTPCMP()
{
  gROOT->SetBatch();
  ROOT::EnableImplicitMT();
  TChain chain("qc");
  chain.Add("trdQC*.root");
  const auto nFiles = chain.GetListOfFiles()->GetEntries();
  auto gVec = MergeVec::getInstance();

  auto workItem = [&gVec](TTreeReader& reader) {
    auto hist = new TH2D("hist", "dEdx", 500, 0, 5, 500, 0, 500);
    TTreeReaderValue<std::vector<o2::trd::TrackQC>> qc(reader, "trackQC");
    std::vector<o2::trd::TrackQC> goodTracks;

    while (reader.Next()) {
      for (const auto& q : *qc.Get()) {
        goodTracks.push_back(q);
      }
    }
    gVec->Merge(goodTracks);
    return hist;
  };

  ROOT::TTreeProcessorMP workers(nFiles);
  auto hist = workers.Process(chain, workItem, "qc");
  std::cout << "Got " << gVec->getData().size() << std::endl;
}

The workItem I implement captures a reference to a var (which is a probably over-engineered class as singleton) used for merging a vector produced in the lambda to a global vector.
However running this macro gives me:

...
ID 0 from 0 to 3247923
ID 0 from 0 to 1283188
Got 0

Basically I never copy the data out of the lambda.
Maybe I oversaw an implementation detail but I do not understand why this would not work even if side effects are discouraged.

Any help is appreciated.

Hi @f3sch,

Before I/we take a closer look… I see that you are using several thread safety mechanisms - I wonder do you have a single-threaded version of this analysis using RDF? In principle, ROOT::EnableImplicitMT(); should suffice to take care of the parallelization. Or if you want to have several concurrent event loops, you can use RunGraphs.

Hi @ikabadzhov,

I have asked another question using RDF (#52019), where I describe the input a bit more in detail.
However, I decided against using RDF (maybe out of ignorance) since I would have to define several columns (e.g. index masks) which did not seem that intuitive and more importantly I never got it to work properly.

Hi @f3sch ,

TTreeProcessorMP is a multi-process executor, but you are using it as if it was a multi-thread executor (different processes do not share memory and will get separate copies of gVec – you can probably verify this by printing the address of gVec, although actually with memory virtualization I guess the address could even be the same).

Your code would probably work with TTreeProcessor__MT__, which runs your lambda in different threads in the same process.

Cheers,
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.