TGraph.Fit() usage with EnableImplicitMT(). Is TGraph.Fit() thread-safe?


Describe the bug

Running TGraph.Fit() with EnableImplicitMT() produces results that are not reproducible run-to-run.

Expected behavior

I expect that running TGraph.Fit() on the identical data, should produce identical results, regardless of enabled multithreading (if run correctly).

To Reproduce

  1. Create dummy data sample root -l create.cpp:
// create.cpp
using ROOT::RDataFrame;
using namespace ROOT::VecOps;
// modified df016_vecOps
int create(){
   auto unifGen = [](double) { return gRandom->Uniform(-1.0, 1.0); };

   auto vGen = [&](int len) {
      RVec<double> v(len);
      std::transform(v.begin(), v.end(), v.begin(), unifGen);
      return v;
   };

   // note that effect is very small and 200k events might be necessary to reproduce it.
   // I couldn't reproduce it with only 1024 events. So big N events is the key parameter.
   RDataFrame d(200000);
   auto d0 = d.Define("len", []() { return (int)gRandom->Uniform(5, 32); })
      .Define("x", vGen, {"len"})
      .Define("y", vGen, {"len"})
      .Redefine("y", "x+y")
      .Snapshot("events", "example.root");
      
   return 0;
}
  1. Run root -l read.cpp several times with multithreading disabled:
// read.cpp
using ROOT::RDataFrame;
using namespace ROOT::VecOps;
// ROOT::EnableImplicitMT(4);

int read(){
   auto fitLine = [](RVec<double> x, RVec<double> y){
    // fit simple line a return a slope
    auto gr = TGraph();

    for (int i=0; i<x.size(); i++) gr.SetPoint(i, x[i], y[i]);

    // Solution
    gr.Fit("pol1", "Q");
    auto reco_slope = gr.GetFunction("pol1")->GetParameter(1);
    return reco_slope;
  };

   auto d = RDataFrame("events", "example.root");
   auto d0 = d.Define("slope", fitLine, {"x", "y"});

    auto h_x = d0.Histo1D({"hx", "hx", 200, -5, 5}, "x");
    auto h_y = d0.Histo1D({"hy", "hy", 200, -5, 5}, "y");
    auto h_slope = d0.Histo1D({"hslope", "hslope", 200, -5, 5}, "slope");
 
   std::cout<<std::fixed<<std::setprecision(3)<<"X: "<<h_x->GetMean()<<" +- "<<h_x->GetStdDev()<<std::endl;
   std::cout<<std::fixed<<std::setprecision(3)<<"Y: "<<h_y->GetMean()<<" +- "<<h_y->GetStdDev()<<std::endl;
   std::cout<<std::fixed<<std::setprecision(3)<<"Slope: "<<h_slope->GetMean()<<" +- "<<h_slope->GetStdDev()<<std::endl;
   return 0;
}

Output on my machine:

X: 0.000 +- 0.577
Y: 0.000 +- 0.817
Slope: 1.000 +- 0.309

I run it as many times as I want and the number will remain identical (reproducible) from run to run.

  1. Run root -l read.cpp several times with multithreading enabled:

Output on my machine:

# X, Y stay always the same
# Run with ROOT::EnableImplicitMT(2);
Slope: 1.000 +- 0.311
# Run with ROOT::EnableImplicitMT(4);
Slope: 1.001 +- 0.311
# Run with ROOT::EnableImplicitMT(6);
Slope: 1.001 +- 0.312
# Run with ROOT::EnableImplicitMT(8);
Slope: 1.001 +- 0.312
# Run with ROOT::EnableImplicitMT(16);
Slope: 1.000 +- 0.313

The standard deviation of the fitted slope is different from the run w/o multi-threading and moreover it changes based on the number of cores I use.

Setup

ROOT v6.36.04
Built for linuxx8664gcc on Sep 19 2025, 13:34:39
From tags/6-36-04@6-36-04
With g++ (Spack GCC) 14.2.0
Binary directory: /cvmfs/sw-nightlies.hsf.org/key4hep/releases/2025-09-21/x86_64-almalinux9-gcc14.2.0-opt/root/6.36.04-6qfi76/bin

ROOT is obtained from:

source /cvmfs/sw-nightlies.hsf.org/key4hep/setup.sh -r 2025-12-15

Additional context

If this is expected behavior, because my code is poorly writen and allows hidden race conditions, I would like to know, how can I avoid that.

I have already observed a very similar behavior, when creating random variables from a single engine caused race conditions, and the solution was to create N random engines, one per slot.
However, in this example the data is static, there is no randomly generated variables, (except maybe initial fit parameters), so I would expect the fit results to be reproducible.

There is also an old forum post from 2012 discussing that TH1::Fit() is not thread-safe, but the mentioned bug report there is long time closed.

What is the current up-to-date information on this?

Hi @FoxWise

Thanks for reporting this issue on the ROOT forum! Adding @Danilo and @jonas in the loop who can help with this.

Cheers,
Aaron

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

See also this GitHub issue for the follow-up discussion:

1 Like