Hi,
Hi everyone,
I’m working with TTreeProcessorMT
and have come across some behavior that I’m having trouble understanding. I’ve provided a simplified version of my implementation below:
void treeProcessor::filter_events_v2(likelihoodNet &lnet)
{
int numberOfThreads = 7;
ROOT::EnableImplicitMT(numberOfThreads);
// Create a TThreadedObject to hold a TGraph for each thread
ROOT::TThreadedObject<TGraph> threadedScatter;
// Create a TTreeProcessorMT: specify the file and the tree in it
ROOT::TTreeProcessorMT processor(fileName, "myTree");
std::atomic<int> taskCounter{0};
// Define the function that will process a subrange of the tree.
// The function must receive only one parameter, a TTreeReader,
// and it must be thread safe. To enforce the latter requirement,
// TThreadedObject histograms will be used.
auto processFunction = [&](TTreeReader &reader)
{
// Access the event branch using TTreeReaderValue
TTreeReaderValue<ProcessedEvent> processed_event(reader, "event");
int taskNumber = taskCounter.fetch_add(1);
// For performance reasons, a copy of the pointer associated to this thread on the
// stack is used
auto localThreadedScatter = threadedScatter.Get();
int localGraphPointCount = 0;
// Process each entry in the current task's range
while (reader.Next())
{
if (/*some filtering logic*/)
{
localThreadedScatter->SetPoint(localThreadedScatter->GetN(), xval, yval);
localGraphPointCount++;
}
}
std::cout << "Task " << taskNumber << " added " << localGraphPointCount << " points." << std::endl;
};
// Launch the parallel processing of the tree
processor.Process(processFunction);
// Use the TThreadedObject::Merge method to merge the thread private scatter plots
// into the final result
auto scatterMerged = threadedScatter.Merge();
// Set the scatter TGraph equal to mergedGraph
*scatter = *scatterMerged;
}
In this function, I noticed that regardless of the value assigned to numberOfThreads
, there are always 7 clusters. Interestingly, the minimum execution time is achieved when numberOfThreads = 7
. As I increase the number of threads from 1 to 7, the execution time decreases. However, going beyond 7 threads has no effect other than random fluctuations in execution time.
My computer has 30 available threads, and I’d like to make full use of them. Is there a way to manually set the number of clusters or configure the function to use more than 7 threads?
Any help or suggestions would be greatly appreciated!
Thanks in advance!