Filling ROOT histogram in parallel in Python

destrada · August 27, 2025, 6:13pm

I’m trying to fill a ROOT histogram using multiprocessing in Python, but the resulting ROOT file contains an empty histogram. The code runs without errors, but no data is actually filled into the histogram.

I wrote this small pice of code to illustrate the problem:

import ROOT as r
from numpy.random import normal, seed
from multiprocessing import Pool, cpu_count

seed(42)

def fill_histogram(histogram, numbers):
    for number in numbers:
        histogram.Fill(number)

if __name__ == "__main__":
    r.EnableThreadSafety()
    Histo = r.TH1F("h1", "h1", 100, 0, 100)
    numbers = [normal(50, 20) for _ in range(1000)]
    number_batches = [numbers[i:i + 100] for i in range(0, len(numbers), 100)]

    pool = Pool(processes=int(cpu_count()))
    for i, batch in enumerate(number_batches):
        pool.apply_async(fill_histogram, args=(Histo, batch,))
    pool.close()
    pool.join()

    output_file = r.TFile.Open("histo_test.root", "RECREATE")
    Histo.Write()
    output_file.Close()

What’s the correct approach? I suspect the issue might be related to how ROOT objects are passed between processes or thread safety, but I’m not sure how to resolve it.

Any guidance on the proper way to parallelize histogram filling with ROOT would be greatly appreciated.

ROOT Version: 6.32.08
Python Version: 3.12.3

silverweed · August 28, 2025, 9:41am

Hello @destrada, welcome to the ROOT Forum!

First off, do you specifically need multiprocessing, or would multithreading be also fine? (judging from the code you posted it seems like the case).

The easiest way you can do what you want is probably by using RDataFrame together with EnableImplicitMT.

This way you can write something like this:

import ROOT

ROOT.EnableImplicitMT()
df = ROOT.RDataFrame(1000)
with ROOT.TFile.Open("histo_test.root", "RECREATE") as output_file:
    h = df.Define("numbers", "gRandom->Gaus(50, 20)").Histo1D(("h1", "h1", 100, 0, 100), "numbers").GetValue()
    output_file.WriteObject(h, h.GetName())

Note that if your real input comes from some other place (e.g. a TTree) you would need to do some minor adjustment to the RDataFrame creation (see the tutorials).
Also note that RDF currently doesn’t support calling python functions directly so I replaced normal with the ROOT C++ function TRandom::Gaus, but it should give the same result.

Let me know if this works for you.

destrada · August 29, 2025, 8:28am

Hi @silverweed , thanks a lot for your answer.

I’m using Python’s multiprocessing simply because it’s the way I know how to do it. It’s not strictly necessary, but it allows me to parallelize the task in a straightforward way.

I can see that using RDataFrame with EnableImplicitMT would be much more efficient. However, I’m currently working on top of a small and old framework where the analysis is applied recursively Over a TTree, entry by entry. Switching to RDataFrame would require significant changes to the codebase, which isn’t feasible at the moment.

That said, I found a workaround: by splitting the histogram into smaller clones, passing them to separate multiprocessing workers, and then merging them into a single .root file, the issue is resolved.

This solution works for now, but I still wonder—would it be possible to avoid creating clones of the histogram altogether?

ferhue · August 29, 2025, 4:21pm

See Thread safe fill of same tree and histograms from different synchronized threads - #3 by pcanal

https://root-forum.cern.ch/search?q=tbuffermerger