Atomic Fill&GetEntries for TBranch

Nicola_Mori · May 13, 2024, 11:16am

I am writing a multithreaded application where I manage the output on a Root file with a TBufferMerger. Several threads call TBranch::Fill concurrently, and after each fill each thread must retrieve the number of the entry it just filled for housekeeping purposes. Fill does not return the number of filled entry, and I think that simply calling GetEntries before or after Fill won’t work since another thread might fill in the meantime. Is there a way to solve this problem in a more efficient way than a mutex?

Danilo · May 13, 2024, 12:08pm

Hi Nicola,

Thanks for the interesting post.
A mutex would be indeed spoiling the thread friendliness of ROOT.
Do you absolutely need the number of entries from TTree? It really depends on the overall design of the sw you are working with, but maybe the “entry number” can be taken from somewhere else, perhaps from an atomic integer (just doing guesswork here)

best,
D

Nicola_Mori · May 13, 2024, 12:40pm

Hi Danilo,

I need to retrieve the entry number to perform synchronization of two branches having different number of entries, i.e. one branch has one entry per event and the other has N per event, with variable N, possibly non contiguous due to event-based multithreading. So every time I Fill the second branch during an event I need to take note of the entry number to fill up a lookup table for offline event readout.
I’m not a multithreading expert, but from my limited knowledge I’d say that an atomic integer won’t help since it can just ensure that no thread reads it while another one is writing: it cannot ensure that the number of entries in the branch has changed since the last Fill operation of the current thread.

pcanal · May 13, 2024, 8:42pm

Hi Nicola,

I am confused :).

where I manage the output on a Root file with a TBufferMerger.

When using the TBufferMerger, the intent is that each ‘stream’ owns exclusive access to its own copy of the TTree and thus the operation are completely independent and there is no need to lock access to them.

Several threads call TBranch::Fill concurrently,

The only ways this works (properly) is that each the branch belong to a separate TTree (i.e. the usual TBufferMerger case or the call Fill is surrounded by a mutex.

Fill does not return the number of filled entry,

It does not have to :). That number is always 1 (except in the case of failure then it is zero).

and I think that simply calling GetEntries before or after Fill won’t work since another thread might fill in the meantime.

(Besides that it is not needed, see previous answer), it would not change as described in your text unless the setup is ‘broken’ with several thread accessing the same TBranch object at the same time (this is not supported and will not work - accessing independent TBranch object is (of course) supported).

i.e. one branch has one entry per event and the other has N per event, with variable N,

Unless you are using a different definition that we usually use for TTree/TBranch this is not a recommended setup. Most automatic tools (TTree::Draw, RDataFrame, etc.) assume that for a given Tree all the branches have the exact same number of entries (However some branch might contains collections and thus have different number of elements per entries and/or for each entry).

So in order to better help you solve the underlying problem, it would greatly help us to have a vision of the bigger picture of the problem you are trying to solve. (This may or may not better be discussed in person)

Nicola_Mori · May 14, 2024, 7:27am

Hi Philippe,

sorry for the confusion, I am reviving some old code that I didn’t touch since long ago, so I messed up with the details. Starting with the big picture, what I am trying to solve is described in this thread: in few words, I need to stream data to the Root file during the event processing, since single-event data is too big to fit in memory. In this particular case I would need a method that can fill a branch containing a collection in an incremental way, i.e. by calling many times per event a hypothetical FillWithAppend method that appends entries to the collection contained in the current entry of the branch. After this I could re-use the in-memory buffers for the subsequent data instead of having to allocate more memory.

Lacking such a facility I guessed that I can call Fill multiple times per event the collection branch, and then take note of the branch entries that correspond to the current event for offline event rebuild during readout. But in an event-level multithreaded (i.e. Geant4) application several threads would need to do the same concurrently if operating on the same branch, or each one have a copy of the branch. In this last case however I guess that troubles could arise if the branches are recombined at the end in a single one, since the entry numbers in the merged branch will be different from those saved for event rebuild. If TBufferMerger actually does something similar then it could screw up all of my code.

About “Fill does not return the number of filled entry” it should have been written as “Fill does not return the number of the filled entry”, my bad. But even if it did then it would probably be useless in the above described branch-merging scenario.

Sorry for the long post, I hope the situation is more clear now. Any help or advice from your side will be greatly appreciated, so thanks in advance.

Nicola_Mori · May 15, 2024, 9:06am

@pcanal Hi, do you have any suggestion for solving my problem in a different way, given the shortcomings of my current solution that you pointed out? I’m afraid of possible future problems that might stem from them, and I’d be really happy to come up with an alternative solution blessed by the Root developers.

system · May 29, 2024, 9:06am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.