Need Help with TTreeProcessorMT: Increasing Number of Threads Beyond 7

Hi,

Hi everyone,

I’m working with TTreeProcessorMT and have come across some behavior that I’m having trouble understanding. I’ve provided a simplified version of my implementation below:

void treeProcessor::filter_events_v2(likelihoodNet &lnet)
{
	int numberOfThreads = 7;
	ROOT::EnableImplicitMT(numberOfThreads);

	// Create a TThreadedObject to hold a TGraph for each thread
    ROOT::TThreadedObject<TGraph> threadedScatter;

	// Create a TTreeProcessorMT: specify the file and the tree in it
	ROOT::TTreeProcessorMT processor(fileName, "myTree");

	std::atomic<int> taskCounter{0};

	// Define the function that will process a subrange of the tree.
	// The function must receive only one parameter, a TTreeReader,
	// and it must be thread safe. To enforce the latter requirement,
	// TThreadedObject histograms will be used.
    auto processFunction = [&](TTreeReader &reader) 
	{
		// Access the event branch using TTreeReaderValue
		TTreeReaderValue<ProcessedEvent> processed_event(reader, "event");

		int taskNumber = taskCounter.fetch_add(1);

		// For performance reasons, a copy of the pointer associated to this thread on the
		// stack is used
		auto localThreadedScatter = threadedScatter.Get();

		int localGraphPointCount = 0;
		
		// Process each entry in the current task's range
		while (reader.Next()) 
		{		
			if (/*some filtering logic*/)
			{
				localThreadedScatter->SetPoint(localThreadedScatter->GetN(), xval, yval);
				localGraphPointCount++;
			}
		}
		std::cout << "Task " << taskNumber << " added " << localGraphPointCount << " points." << std::endl;
    };

	// Launch the parallel processing of the tree
	processor.Process(processFunction);

	// Use the TThreadedObject::Merge method to merge the thread private scatter plots
  	// into the final result
	auto scatterMerged = threadedScatter.Merge();

	// Set the scatter TGraph equal to mergedGraph
    *scatter = *scatterMerged;
}

In this function, I noticed that regardless of the value assigned to numberOfThreads, there are always 7 clusters. Interestingly, the minimum execution time is achieved when numberOfThreads = 7. As I increase the number of threads from 1 to 7, the execution time decreases. However, going beyond 7 threads has no effect other than random fluctuations in execution time.

My computer has 30 available threads, and I’d like to make full use of them. Is there a way to manually set the number of clusters or configure the function to use more than 7 threads?

Any help or suggestions would be greatly appreciated!

Thanks in advance!

In this particular case my guess would be that your input tree has 7 TTree entry clusters :slight_smile:

EDIT:
the reason why a TTree entry cluster is the smallest granularity for parallelism is that if 2 threads processed different parts of the same cluster, each thread would have to read and decompress the same data, resulting in some redundant work.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.