Seeking Advice on Efficiently Handling Large Data Sets with ROOT TTree and Parallel Processing

ficov12960 · October 28, 2024, 7:59am

Hey guys…

I am currently working on a project that requires analyzing a large volume of data and I am looking for some advice on optimizing my workflow using ROOT. My data is stored in TTrees, each containing several million entries, and the analysis involves complex calculations on various branches. So far, my single-threaded approach works, but it’s becoming increasingly slow as my data grows.

I’ve explored some multi-threading options, including TTree’s TThreadedObject, and I’m aware of the TTreeProcessorMT as well. However, I am not sure which approach is best suited for balancing memory efficiency and processing speed. Also, I’d love some tips on minimizing the memory footprint when processing in parallel since my system has limited RAM.

My specific questions are:

Are there any best practices or less-documented features in ROOT for handling multi-threaded data processing efficiently?
How can I best manage memory usage when using TTreeProcessorMT?
Has anyone tried integrating ROOT’s parallel processing with other libraries, like TBB or even Python-based approaches (e.g., using PyROOT with multiprocessing)?

Thanks in advance!

Respected community member!

bellenot · October 28, 2024, 8:21am

You can take a look at RDataframe Dataframes - ROOT and ROOT: Dataframe tutorials

system · November 11, 2024, 8:21am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.