Idioms for multithread I/O with TFile+TTree?

I’m in the process of developing an event processing application, and just wanted to check that my understanding of multithreaded I/O and TTrees is correct and that I’m not doing anything dumb…

The use case is simple:

[ol]
[li] Read a TTree from a TFile, the TTree having a branch holding “Event” objects.[/li]
[li] Read each entry from the TTree, passing the “Event” to a processing step.[/li]
[li] Process each “Event” in a separate thread (Pipeline/Tasks)[/li]
[li] Write processed “Event” and other data to an output TTree/TFile which are different to the input ones.[/li][/ol]

The first, second and last step would always be serial, and as I understand it, the read/access to the File and Tree are thread safe in this case?

The bit I’m a little unsure of is the parallel processing of the “Event” objects and hence how best to read/write them from the input/output Trees. Based on the User Guide for TTrees and the behaviour of SetBranchAddress and GetEntry regarding ownership, my current implementation for reading is:

TFile f("myfile.root","READ");
TTree* t = dynamic_cast<TTree*>(f.Get("MyEventTree"));
MyEvent* eventPtr(0);
t->SetBranchAddress("MyEventBranch", &eventPtr);

for (size_t i(0); i < t->GetEntries(); ++i) {
  t->GetEntry(i);
  processEvent(eventPtr); // Takes ownership of the instance
  eventPtr=0;
}

Is that reasonable? My own testing indicates that it is - in the sense that a new Event instance is created for each entry, and that I can clean these myself without issue.

On the writing side, I have

void processEvent(MyEvent* eventPtr) {
  // TFile/Tree created globally for now
  outputTree->SetBranchAddress("MyEventOut", &eventPtr);
  outputTree->Fill();
  delete eventPtr;
}

Is this also o.k.? It does seems o.k. in terms of memory management, validity of data and that the two TFiles/Trees don’t clash with gDirectory.

Apologies if this is obvious or stupid, but I want to make sure I’m on the right track :slight_smile:
If there’re parts of Root I’m missing that would help with this pattern, or if the code is reasonable, but could be improved in any way I’d welcome any advice!

Thanks,

Ben.

You will most likely have issue if the output tree is shared among your threads. If two simultaneous Fill commands are called, I’m not sure what the result would be.

I suggest either writing multiple output trees and merging them afterward or using a consumer producer scheme for both the input and output.

Allows you don’t indicate where outputTree lives or how you are handling the threads, but I assume you were just trying to provide a compact example.

[quote=“ksmith”]You will most likely have issue if the output tree is shared among your threads. If two simultaneous Fill commands are called, I’m not sure what the result would be.

I suggest either writing multiple output trees and merging them afterward or using a consumer producer scheme for both the input and output.

Allows you don’t indicate where outputTree lives or how you are handling the threads, but I assume you were just trying to provide a compact example.[/quote]

Yes, the example assumes a producer/consumer scheme. The input and output steps would be (single threaded) producer/consumer respectively, the processing step between these being (multithreaded) consumer/producer, so

ReadFromTreeA(Sequential) -> InQueue -> ProcessEvent(Concurrent) -> OutQueue -> WriteToTreeB(Sequential)

That ignores factors like ordering of the output events or the internal architecture of the ProcessEvent step (other than multiple independent events in parallel).

Hi Ben,

Yes, as long as only one thread/task execute your processEvent function you will be fine. (Also the code seems to imply that there is only one top level branch. If there is more than one, you would need to set the address for all of them).

Note that since processEvent set the branch address to a local variable (the parameter) you need to reset the branch address at the end (to avoid leaving the branch with a dangling pointer):outputTree->ResetBranchAddresses(); // that resets branches but that's fine since there is only one branch anyway

Cheers,
Philippe.