I’m using TTreeProcessorMP (also tried TTreeProcessorMT) to parallelize my TTree analysis. Because my analysis function is filling a large number of histograms, I’m running into some memory problems. As far as I understood, all histograms created in all calls of the process function need to be kept in memory in order to be merged at the end.
The problem is that there is no control over how many “clusters” of events are created by both aforementioned classes and therefore how many times the process function is being called. I think it would be helpful to have a mode when N workers would split the tree events equally by themselves so that the process function is only called N times.
I saw that in the development version 6.17 TTreeProcessorMT::GetMaxTasksPerFilePerWorker was added which might allow such things? It would be great if @eguiraud or @dpiparo could comment on this.
with TTreeProcessorMT, the usual parallelization scheme is to only have one copy of the histogram per thread, which is reused by subsequent calls to the process function in the same thread, so you have n_thread times the memory usage of a single-thread program, and you can control n_thread if need be.
With TTreeProcessorMP, if the number of files to process is larger than the number of worker processes, the process lambda is invoked once per file. Otherwise entries are indeed divided equally between workers as you suggest (although this is not the absolute best way to divide work on TTree entries, as it leads to redundant decompression of some entry clusters).
Thanks for your clarifications. The situation is a bit more complicated for me since I’m using my own histogram class for filling different histograms as a function of 2 variables. I guess I would need to implement the Merge() method in order to use TTreeProcessorMT.
Regarding TTreeProcessorMP, would it be possible to add an option to have the lambdas called once per worker even if the number of files is larger than the number of workers? Basically forcing the same mode as when the number of files is smaller. This was actually the behavior I was naively expecting when providing a TChain (rather than a vector of files). This could be useful when results more complex than histograms (and large in size) need to be merged (“manually”) using the return object of TTreeProcessorMP::Process(). Just an idea…
I guess I would need to implement the Merge() method in order to use TTreeProcessorMT.
Yes, not necessarily the
Merge method, but definitely some mechanism to aggregate each thread’s partial result. That’s the parallelization scheme that RDataFrame uses and it worked quite well for us (and our users) so far.
But you would need some way to aggregate your histograms also if you did multi-processing with TTreeProcessorMP, wouldn’t you?
would it be possible to add an option to have the lambdas called once per worker even if the number of files is larger than the number of workers?
It would definitely be possible (please open a jira ticket with the feature request if you need it), but since it’s a new feature and not a bugfix I can’t promise short timescales – TTreeProcessorMP has been mostly in maintenance mode since 2017.
Yes, I need merging in TTreeProcessorMP as well but there I don’t need to modify my manager regarding thread-safety because of the fork() parallelization scheme used there. On the other hand, I would like to have only N workers going through my chain as in TTreeProcessorMT (and as in PROOF-Lite) because otherwise the memory usage is exploding.
I guess I could also write to ROOT files at the end of each lambda and then merge them. Not so elegant though.
I’ll try to make a feature request on jira.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.