RDataFrame custom ActionHelper check if initialized when using MT

Hi,
@eguiraud ( I think you are the right one to ask :slight_smile: )
Is there a way for a custom ActionHelper to tell which of it’s slots have been Initialized. My custom action helper uses the InitTask function to read elements in the input tree to configure the binning of custom histograms it keeps track of. During the Finalize() function I perform a merging of said histograms. This will fail because the RDataFrame can have more workers than are being used (?) and then some of the slots are not being initialized. Right now I have a custom implementation in my helper that keeps track if a slot has been initialized or not. But i think this problem may be more general and all Helpers which require Initialization before processing events are affected by this issue. In case an example is needed I have provided a small file below showing some pseudo code example of where the problem appears.helper.hxx (1.6 KB)
Do you have any tips?
Best regards,
Lukas
ROOT Version: v6.19/01
Platform: Not Provided
Compiler: Not Provided


Hi Lukas,
welcome to the ROOT forum – and with a tough question, too! :smile:

We should really document InitTask better (PRs are welcome, I’m away from ROOT work at the moment): not only it can happen that it’s not called for some slots (imagine only one task is created, because the file is very small, but you are running with 64 slots – then InitTask) but it can also be called more than one time per slot (e.g. if your file is very large and RDF splits it in 20 processing tasks, while you only have 4 slots/threads in the thread pool).

i think this problem may be more general and all Helpers which require Initialization before processing events are affected by this issue.

In general you should use Initialize, not InitTask, to perform initialization that you need to be done for every slot. I understand that in your case you need information from the TTree itself to initialize the slots.
In that case you can use empty histograms or nullptrs to indicate slots that have not been initialized, and deal with them appropriately in Finalize.

I hope this helps.
Cheers,
Enrico

Hi Enrico,

Thanks a lot for your explanations. They have been very helpful. Implementing an appropriate check i n the Finalize and InitTask to make sure every slot is initiated at most once and only initiated slots pass on to the merging function now enables my helper to work with multi threading enabled.
If I understand correctly, during the time of Initialize there is no way of knowing already how many slots will be used by the data source, which does the looping. So the only information that one could retrieve is the maximum number of slots given by ROOT::GetImplicitMTPoolSize()?

This solved my problem.
Thanks :slight_smile:,
Lukas

Due to how the task scheduling works (thread pool with per-thread task queues with work stealing), it’s never possible to know in advance with certainty how many of the available slots will actually be used.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.