Multi-thread: Random in AvalancheMC and AvalancheMicroscopic

Hi,

I’m currently rewriting my code to enable multi-threading with OpenMP, following the examples provided. However, I’ve noticed that the results vary depending on the number of threads used to run the code.

I suspect that the random number generation methods, such as Random::RndmUniform, which are used in both AvalancheMC and AvalancheMicroscopic, might not be thread-safe.

Could this be the cause of the issue?

Thanks for your help,

Pierre

Hello, adding @hschindl in the loop

Hello,

Is your Garfield++ installation at least as recent as commit 101dfdae? I just ran the pixel_mt.C example and obtained the same results for both sequential and parallel execution.

Sincerely,
Gabriel

Hi,

indeed, after fixing the issue reported by @gabrielribcesario, AvalancheMCshould be thread-safe now. However, even if you fix the seed of the random number generator, you will not get identical results every time you run the program in the parallelized version of the code.

Hi @gabrielribcesario and @hschindl , thank you for this detail !

Nevertheless, even after pulling last commit, I still have issues when launching my code with OMP_NUM_THREADS > 1. Errors like these occur.

AskedDensity_mt(87454,0x1f74960c0) malloc: *** error for object 0x600001c37ca0: pointer being freed was not allocated
AskedDensity_mt(87454,0x1f74960c0) malloc: *** set a breakpoint in malloc_error_break to debug

 *** Break *** segmentation violation

 *** Break *** segmentation violation

 *** Break *** segmentation violation
[1]    87454 abort      OMP_NUM_THREADS=7 ./AskedDensity_mt --density 9e10

Here is the OpenMP-parallelized part of my code; the full code is in the attached file. I am trying to drift e-, i+, and i- in parallel, but unlike the example, I want to drift them step by step in time in order to update some density maps.

Thanks for your help.

  // -- Simulation -- //
  // Create buffers
  const int T = omp_get_max_threads();
  std::vector<std::vector<P>> electrons_b(T);
  std::vector<std::vector<P>> ions_b(T);
  std::vector<std::vector<P>> negions_b(T);

  double t = tStartSimu;
  while (t < tEndSimu) {
    for (int k=0;k<T;++k) {
      electrons_b[k].clear();
      ions_b[k].clear();
      negions_b[k].clear();
    }
      
    // --- Drift e- ------------------------------------------------------
    const int nelectrons = electrons.size();
    #pragma omp parallel
    {
      const int tid = omp_get_thread_num();
      #pragma omp for reduction(+: nAttaElectrons)
      for (int i = 0; i < nelectrons; ++i) {
        AvalancheMicroscopic aval;
        aval.SetSensor(&sensor);
        aval.SetTimeWindow(t, t+dt);
        aval.DriftElectron(electrons[i].x, electrons[i].y, electrons[i].z,
                           electrons[i].t, 0.1, 0., 0., 0., electrons[i].w);
        for (const auto& p : aval.GetElectrons()) {
          const auto& p1 = p.path.back();
          if (p.status == -7) {
            nAttaElectrons++;
            negions_b[tid].push_back(P{p1.x, p1.y, p1.z, p1.t, p.weight});
          }
          else electrons_b[tid].push_back(P{p1.x, p1.y, p1.z, p1.t, p.weight});
        }
      }
    }

    for (auto& b : negions_b) {
      negions.insert(negions.end(), b.begin(), b.end());
    }
    for (auto& k : negions_b) k.clear();

    // --- Drift i- ------------------------------------------------------
    const int nnegions = negions.size();
    #pragma omp parallel
    {
      const int tid = omp_get_thread_num();
      #pragma omp for reduction(+: nRecombNegions)
      for (int i = 0; i < nnegions; ++i) {
        AvalancheMC driftnegion;
        driftnegion.SetSensor(&sensor);
        driftnegion.EnableRecombination(withIonRecombination, alpha);
        driftnegion.EnableDiffusion(withIonDiffusion);
        driftnegion.EnableDensityMap(withDensityMap);
        driftnegion.SetTimeSteps(.2); // [ns] default 0.02
        driftnegion.SetTimeWindow(t, t+dt);
        driftnegion.DriftNegativeIon(negions[i].x, negions[i].y, negions[i].z,
                                     negions[i].t, negions[i].w);
        for (const auto& p : driftnegion.GetNegativeIons()) {
          const auto& p1 = p.path.back();
          if (p.status == -9) nRecombNegions++; 
          else negions_b[tid].push_back(P{p1.x, p1.y, p1.z, p1.t, p.weight});
        }
      }
    }

    // --- Drift i+ ------------------------------------------------------
    const int nions = ions.size();
    #pragma omp parallel
    {
      const int tid = omp_get_thread_num();
      #pragma omp for reduction(+: nRecombIons)
      for (int i = 0; i < nions; ++i) {
        AvalancheMC driftion;
        driftion.SetSensor(&sensor);
        driftion.EnableRecombination(withIonRecombination, alpha);
        driftion.EnableDiffusion(withIonDiffusion);
        driftion.EnableDensityMap(withDensityMap);
        driftion.SetTimeSteps(.2); // [ns] default 0.02
        driftion.SetTimeWindow(t, t+dt);
        driftion.DriftIon(ions[i].x, ions[i].y, ions[i].z,
                          ions[i].t, ions[i].w);
        for (const auto& p : driftion.GetIons()) {
          const auto& p1 = p.path.back();
          if (p.status == -9) nRecombIons++;
          else ions_b[tid].push_back(P{p1.x, p1.y, p1.z, p1.t, p.weight});
        }
      }
    }
    
    // --- Update global variable and clear buffers -----------------------
    electrons.clear();
    for (auto& b : electrons_b) {
      electrons.insert(electrons.end(), b.begin(), b.end());
    }
    negions.clear();
    for (auto& b : negions_b) {
      negions.insert(negions.end(), b.begin(), b.end());
    }
    ions.clear();
    for (auto& b : ions_b) {
      ions.insert(ions.end(), b.begin(), b.end());
    }

    grid.ClearFields();
    grid.SetUniformElectricField(0., 0., 0.);
    for (auto& ion : ions) grid.AddIon(ion.x, ion.y, ion.z, ion.w);
    for (auto& nion : negions) grid.AddNegativeIon(nion.x, nion.y, nion.z, nion.w);

    t += dt;
  }

AskedDensity_mt.cc (10.9 KB)

Edit: I just ran the code with OMP_NUM_THREADS=4 and it suddenly worked without changing anything. Then I ran it again and got this error. It seems very unstable — the more I increase the number of threads or the number of simulated pairs, the more likely it fails.

 *** Break *** segmentation violation
[/usr/lib/system/libsystem_platform.dylib] _sigtramp (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] std::__1::vector<double, std::__1::allocator<double>>::vector(std::__1::vector<double, std::__1::allocator<double>> const&) (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] std::__1::vector<double, std::__1::allocator<double>>::vector(std::__1::vector<double, std::__1::allocator<double>> const&) (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] void std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>::construct[abi:ne190102]<std::__1::vector<double, std::__1::allocator<double>>, std::__1::vector<double, std::__1::allocator<double>>&>(std::__1::vector<double, std::__1::allocator<double>>*, std::__1::vector<double, std::__1::allocator<double>>&) (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] void std::__1::allocator_traits<std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>::construct[abi:ne190102]<std::__1::vector<double, std::__1::allocator<double>>, std::__1::vector<double, std::__1::allocator<double>>&, 0>(std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>&, std::__1::vector<double, std::__1::allocator<double>>*, std::__1::vector<double, std::__1::allocator<double>>&) (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] std::__1::vector<double, std::__1::allocator<double>>* std::__1::__uninitialized_allocator_copy_impl[abi:ne190102]<std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>, std::__1::vector<double, std::__1::allocator<double>>*, std::__1::vector<double, std::__1::allocator<double>>*, std::__1::vector<double, std::__1::allocator<double>>*>(std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>&, std::__1::vector<double, std::__1::allocator<double>>*, std::__1::vector<double, std::__1::allocator<double>>*, std::__1::vector<double, std::__1::allocator<double>>*) (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] std::__1::vector<double, std::__1::allocator<double>>* std::__1::__uninitialized_allocator_copy[abi:ne190102]<std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>, std::__1::vector<double, std::__1::allocator<double>>*, std::__1::vector<double, std::__1::allocator<double>>*, std::__1::vector<double, std::__1::allocator<double>>*>(std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>&, std::__1::vector<double, std::__1::allocator<double>>*, std::__1::vector<double, std::__1::allocator<double>>*, std::__1::vector<double, std::__1::allocator<double>>*) (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] void std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>::__construct_at_end<std::__1::vector<double, std::__1::allocator<double>>*, std::__1::vector<double, std::__1::allocator<double>>*>(std::__1::vector<double, std::__1::allocator<double>>*, std::__1::vector<double, std::__1::allocator<double>>*, unsigned long) (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] void std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>::__init_with_size[abi:ne190102]<std::__1::vector<double, std::__1::allocator<double>>*, std::__1::vector<double, std::__1::allocator<double>>*>(std::__1::vector<double, std::__1::allocator<double>>*, std::__1::vector<double, std::__1::allocator<double>>*, unsigned long) (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>::vector(std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>> const&) (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>::vector(std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>> const&) (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] void std::__1::allocator<std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>>::construct[abi:ne190102]<std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>, std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>> const&>(std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>*, std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>> const&) (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] void std::__1::allocator_traits<std::__1::allocator<std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>>>::construct[abi:ne190102]<std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>, std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>> const&, 0>(std::__1::allocator<std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>>&, std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>*, std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>> const&) (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] std::__1::vector<std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>, std::__1::allocator<std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>>>::__construct_at_end(unsigned long, std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>> const&) (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] std::__1::vector<std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>, std::__1::allocator<std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>>>::assign(unsigned long, std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>> const&) (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] Garfield::Medium::Init(unsigned long, unsigned long, unsigned long, std::__1::vector<std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>, std::__1::allocator<std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>>>&, double) (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] Garfield::Medium::VelocityFromMobility(std::__1::vector<std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>, std::__1::allocator<std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>>> const&, std::__1::vector<std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>, std::__1::allocator<std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>>>&) (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] Garfield::Medium::IonVelocity(double, double, double, double, double, double, double&, double&, double&) (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] Garfield::AvalancheMC::GetVelocity(Garfield::Particle, Garfield::Medium*, std::__1::array<double, 3ul> const&, std::__1::array<double, 3ul> const&, std::__1::array<double, 3ul> const&, std::__1::array<double, 3ul>&) const (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] Garfield::AvalancheMC::DriftLine(Garfield::AvalancheMC::Seed const&, std::__1::vector<Garfield::AvalancheMC::Point, std::__1::allocator<Garfield::AvalancheMC::Point>>&, std::__1::vector<Garfield::AvalancheMC::Seed, std::__1::allocator<Garfield::AvalancheMC::Seed>>&, bool, bool) const (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] Garfield::AvalancheMC::TransportParticles(std::__1::vector<Garfield::AvalancheMC::Seed, std::__1::allocator<Garfield::AvalancheMC::Seed>>&, bool, bool, bool) (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] Garfield::AvalancheMC::DriftIon(double, double, double, double, unsigned long) (no debug info)
[/Users/pierregerard/Desktop/Thesis/garfieldpp_projects/TestComponentGrid/build/AskedDensity_mt] main.omp_outlined.22 (no debug info)
[/opt/homebrew/Cellar/llvm/21.1.0/lib/libomp.dylib] __kmp_invoke_microtask (no debug info)
[<unknown binary>] (no debug info)
AskedDensity_mt(89270,0x38380b000) malloc: double free for ptr 0x14e019200
AskedDensity_mt(89270,0x38380b000) malloc: *** set a breakpoint in malloc_error_break to debug
[1]    89270 abort      OMP_NUM_THREADS=4 ./AskedDensity_mt --density 9e10 --pairs 1e3

Hello,

Can you replace your Medium.cc and Medium.hh with the following files and see if the error persists?

Medium.cc (53.2 KB)
Medium.hh (37.5 KB)

I simply added a mutex right before the tab.assign() command in both overloads of Medium::Init. I’m not sure if this is the most elegant solution, but it did work for me.

Also, for some reason, the crash only occurs if you run both AvalancheMicroscopic and AvalancheMC, or at least it does so for me in the program you’ve attached.

Hi @gabrielribcesario,

Thank you for this, it’s working for me as well now! I also found out while debugging that the issue comes from the combination of AvalancheMC and AvalancheMicroscopic, strange…

Also, in my multi-threaded program, I need to end it with app.Run() in order to prevent the following error. However, when I do this, the program keeps running as if a canvas were open, which is normal. For my sequential code, I don’t need app.Run(), and the program terminates correctly. Could this be related? The error still persists even with the new Medium class.


 *** Break *** bus error
[/usr/lib/system/libsystem_platform.dylib] _sigtramp (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] std::__1::default_delete<TStyle>::operator()[abi:ne190102](TStyle*) const (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] std::__1::unique_ptr<TStyle, std::__1::default_delete<TStyle>>::reset[abi:ne190102](TStyle*) (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] std::__1::unique_ptr<TStyle, std::__1::default_delete<TStyle>>::~unique_ptr[abi:ne190102]() (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] std::__1::unique_ptr<TStyle, std::__1::default_delete<TStyle>>::~unique_ptr[abi:ne190102]() (no debug info)
[/usr/lib/system/libsystem_c.dylib] __cxa_finalize_ranges (no debug info)
[/usr/lib/system/libsystem_c.dylib] exit (no debug info)
[/usr/lib/system/libdyld.dylib] dyld4::LibSystemHelpers::getenv(char const*) const (no debug info)
[/usr/lib/dyld] dyld4::LibSystemHelpersWrapper::exit(int) const (no debug info)
[/usr/lib/dyld] start (no debug info)

@hschindl , do you think this could be a good bug correction ?

That is likely related to the behavior I’m describing here. Basically, the PlottingEngine class sets a global plotting style with m_style, potentially before the main function initialization since m_style is a static member. In doing so, the ROOT environment gains ownership of the memory address pointed by m_style. During the program shutdown, the ROOT environment frees the memory address before the unique_ptr destructor’s is called, leading to a double-free error.

Thank you for the explanation. It’s a bit too deep for me to understand everything. Is there an easy way to prevent that and shut down the program safely?

Hello,

@hschindl said he would take a look into this. While an official fix is not out yet, I guess you can try adding this to PlottingEngine.cc:

// m_style->cd() and force the style on all existing objects
void PlottingEngine::Initialise() {
  gROOT->SetStyle(m_style->GetName()); 
  gROOT->ForceStyle();
}

// Automatically released by ROOT at the end of the program
// If not, then the memory is reclaimed by the OS (I guess?)
TStyle *PlottingEngine::m_style = new TStyle("Garfield","Garfield Style"); 

// Run PlottingEngine::Initialise() once during program startup
// The constructor wrapper forces static initialization
struct PEInitHelper {
  PEInitHelper() {
    PlottingEngine::Initialise();
  }
};
static PEInitHelper init;

And in PlottingEngine.hh replace the m_style declaration with

  static TStyle *m_style;

and add the following function declaration to the public members of the class:

static void Initialise();

I don’t really like this fix since it relies on either ROOT or the OS to do memory management, plus it might trigger false positives if you’re using a debugger. Other than that, I don’t think it will have any noticeable negative impacts.

By the way, I forgot to comment on this in my previous post. I believe that the real problem lies between AvalancheMC and AvalancheMicroscopic. After all, you’ll only get the problematic behavior if you use both methods, so I’m guessing that the class Medium just so happened to be the one causing the crashes since it is thread-unsafe. @hschindl or one of the other Garfield devs might have some insight on this. Funnily enough, if you run AvalancheMC before AvalancheMicroscopic, you get choppy CPU performance during the AvalancheMicroscopic run and the program takes forever to finish.

It looks like merge request !526 has fixed both problems, the Medium and the PlottingEngine classes, great news !

Edit: I was a bit optimistic about the segmentation violation. I still get errors on some runs, but much less frequently.

 *** Break *** segmentation violation
[/usr/lib/system/libsystem_platform.dylib] _sigtramp (no debug info)
[/Users/pierregerard/Desktop/Thesis/garfieldpp_projects/TestComponentGrid/build/flic_mt] std::__1::vector<double, std::__1::allocator<double>>::~vector[abi:ne190102]() /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/v1/vector:541
[/Users/pierregerard/Desktop/Thesis/garfieldpp_projects/TestComponentGrid/build/flic_mt] std::__1::vector<double, std::__1::allocator<double>>::~vector[abi:ne190102]() /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/v1/vector:541
[/Users/pierregerard/Desktop/Thesis/garfieldpp_projects/TestComponentGrid/build/flic_mt] std::__1::vector<double, std::__1::allocator<double>>::~vector[abi:ne190102]() /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/v1/vector:541
[/Users/pierregerard/Desktop/Thesis/garfieldpp_projects/TestComponentGrid/build/flic_mt] std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>::destroy[abi:ne190102](std::__1::vector<double, std::__1::allocator<double>>*) /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/v1/__memory/allocator.h:168
[/Users/pierregerard/Desktop/Thesis/garfieldpp_projects/TestComponentGrid/build/flic_mt] void std::__1::allocator_traits<std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>::destroy[abi:ne190102]<std::__1::vector<double, std::__1::allocator<double>>, 0>(std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>&, std::__1::vector<double, std::__1::allocator<double>>*) /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/v1/__memory/allocator_traits.h:336
[/Users/pierregerard/Desktop/Thesis/garfieldpp_projects/TestComponentGrid/build/flic_mt] std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>::__base_destruct_at_end[abi:ne190102](std::__1::vector<double, std::__1::allocator<double>>*) /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/v1/vector:985
[/Users/pierregerard/Desktop/Thesis/garfieldpp_projects/TestComponentGrid/build/flic_mt] std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>::__clear[abi:ne190102]() /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/v1/vector:980
[/Users/pierregerard/Desktop/Thesis/garfieldpp_projects/TestComponentGrid/build/flic_mt] std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>::__destroy_vector::operator()[abi:ne190102]() /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/v1/vector:530
[/Users/pierregerard/Desktop/Thesis/garfieldpp_projects/TestComponentGrid/build/flic_mt] std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>::~vector[abi:ne190102]() /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/v1/vector:541
[/Users/pierregerard/Desktop/Thesis/garfieldpp_projects/TestComponentGrid/build/flic_mt] std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>::~vector[abi:ne190102]() /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/v1/vector:541
[/Users/pierregerard/Desktop/Thesis/garfieldpp_projects/TestComponentGrid/build/flic_mt] std::__1::allocator<std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>>::destroy[abi:ne190102](std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>*) /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/v1/__memory/allocator.h:168
[/Users/pierregerard/Desktop/Thesis/garfieldpp_projects/TestComponentGrid/build/flic_mt] void std::__1::allocator_traits<std::__1::allocator<std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>>>::destroy[abi:ne190102]<std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>, 0>(std::__1::allocator<std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>>&, std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>*) /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/v1/__memory/allocator_traits.h:336
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] std::__1::vector<std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>, std::__1::allocator<std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>>>::__base_destruct_at_end[abi:ne190102](std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>*) (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] std::__1::vector<std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>, std::__1::allocator<std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>>>::__clear[abi:ne190102]() (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] std::__1::vector<std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>, std::__1::allocator<std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>>>::__destroy_vector::operator()[abi:ne190102]() (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] std::__1::vector<std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>, std::__1::allocator<std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>>>::~vector[abi:ne190102]() (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] std::__1::vector<std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>, std::__1::allocator<std::__1::vector<std::__1::vector<double, std::__1::allocator<double>>, std::__1::allocator<std::__1::vector<double, std::__1::allocator<double>>>>>>::~vector[abi:ne190102]() (no debug info)
[/Users/pierregerard/Applications/garfieldpp/install/lib/libGarfield.0.3.dylib] Garfield::Medium::~Medium() (no debug info)
[/Users/pierregerard/Desktop/Thesis/garfieldpp_projects/TestComponentGrid/build/flic_mt] Garfield::MediumGas::~MediumGas() /Users/pierregerard/Applications/garfieldpp/install/include/Garfield/MediumGas.hh:20
[/Users/pierregerard/Desktop/Thesis/garfieldpp_projects/TestComponentGrid/build/flic_mt] Garfield::MediumMagboltz::~MediumMagboltz() /Users/pierregerard/Applications/garfieldpp/install/include/Garfield/MediumMagboltz.hh:30
[/Users/pierregerard/Desktop/Thesis/garfieldpp_projects/TestComponentGrid/build/flic_mt] Garfield::MediumMagboltz::~MediumMagboltz() /Users/pierregerard/Applications/garfieldpp/install/include/Garfield/MediumMagboltz.hh:30
[/Users/pierregerard/Desktop/Thesis/garfieldpp_projects/TestComponentGrid/build/flic_mt] main /Users/pierregerard/Desktop/Thesis/garfieldpp_projects/TestComponentGrid/flic_mt.cc:418
[/usr/lib/dyld] start (no debug info)

1 Like

It looks like a problem in MediumMagboltz’s destructor. In Garfield’s top level CMakeLists.txt add the following flags to target_compile_options:

target_compile_options(Garfield PRIVATE
  -g3
  -Og
  -fsanitize=address
...
)

And add

target_link_options(Garfield PRIVATE -fsanitize=address)

right after the compile options. Try triggering the crash and send back the stack trace.

I wasn’t able to find any edits to the Medium.cc and Medium.hh files, I believe they only fixed the plotting style problem.

I changed the Garfield CMakeLists.txt and recompiled it. Now I have always this crash trying to launch my project.

flic_mt(8960,0x1f74960c0) malloc: nano zone abandoned due to inability to reserve vm space.
MediumMagboltz::SetComposition: O2/Ar/H2O/N2/CO2 (20.9/0.94/0.03/78.1/0.03)
MediumMagboltz::LoadMobility:
    Read 22 values from file IonMobility_N2+_N2.txt
MediumMagboltz::LoadMobility:
    Read 21 values from file NegIonMobility_O2-_air.txt
MediumMagboltz::Mixer:
    4000 linear energy steps between 0 and 40 eV.
=================================================================
==8960==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x00016ee87159 at pc 0x00014d7d2c38 bp 0x00016ee86c90 sp 0x00016ee86450
READ of size 26 at 0x00016ee87159 thread T0
    #0 0x00014d7d2c34 in strlen+0x1b0 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x7ac34)
    #1 0x00010226bd64 in Garfield::MediumMagboltz::Mixer(bool) MediumMagboltz.cc:1190
    #2 0x000102265a60 in Garfield::MediumMagboltz::Update(bool) MediumMagboltz.cc:999
    #3 0x000100f67d84 in main+0x944 (flic_mt:arm64+0x100007d84)
    #4 0x000189162b94 in start+0x17b8 (dyld:arm64e+0xfffffffffff3ab94)

Address 0x00016ee87159 is located in stack of thread T0 at offset 1209 in frame
    #0 0x00010226aa14 in Garfield::MediumMagboltz::Mixer(bool) MediumMagboltz.cc:1008

  This frame has 11 object(s):
    [32, 56) 'ref.tmp32' (line 1070)
    [96, 120) 'ref.tmp37' (line 1072)
    [160, 184) 'ref.tmp44' (line 1075)
    [224, 248) 'ref.tmp51' (line 1077)
    [288, 312) 'ref.tmp58' (line 1080)
    [352, 376) 'ref.tmp65' (line 1082)
    [416, 440) 'gasNumber' (line 1112)
    [480, 1048) 'outfile' (line 1134)
    [1184, 1209) 'name' (line 1146) <== Memory access at offset 1209 overflows this variable
    [1248, 1256) 'ngs' (line 1180)
    [1280, 1304) 'minIonPotGas' (line 1468)
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow MediumMagboltz.cc:1190 in Garfield::MediumMagboltz::Mixer(bool)
Shadow bytes around the buggy address:
  0x00016ee86e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x00016ee86f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x00016ee86f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x00016ee87000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x00016ee87080: 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2 f2 f2 f2
=>0x00016ee87100: f2 f2 f2 f2 f2 f2 f2 f2 00 00 00[01]f2 f2 f2 f2
  0x00016ee87180: 00 f2 f2 f2 f8 f8 f8 f3 f3 f3 f3 f3 00 00 00 00
  0x00016ee87200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x00016ee87280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x00016ee87300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x00016ee87380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==8960==ABORTING
    [1]    8960 abort      OMP_NUM_THREADS=3 ./flic_mt --pairs 1e4 --density 6e10

Effectively, for Medium.cc and Medium.hh, changes are manly in commit 35639185. But seeing the changes, I don’t understand how they can modify the behavior.

Hi,
sorry for being late to this discussion!
@gabrielribcesario should I commit your fix to Medium.cc and Medium.hh? Or do you have a CERN gitlab account?
As for the interference between AvalancheMicroscopic and AvalancheMC I must admit I don’t have a good idea…

Hello,

I do not have a CERN gitlab account and I don’t think I can have one, as I am not affiliated with CERN, so you’ll have to commit it. While we’re here, could you take a look into this post? I managed to find a solution for a problem I had about a month ago with neBEM. I had to edit my last answer since it’s been more than 14 days since the last post, so it probably did not trigger a notification. It is not a good solution since it increases the dependencies between Garfield and neBEM by requiring neBEM.h to be installed in Garfield’s INCLUDEDIR, and the CMakeLists.txt hints at a future decoupling of both modules.

@pigerard I see what you mean now. These changes did not fix/add a mutex to the base Medium class and a lock to the Medium::Init definition in Medium.cc, it’s only a refactoring of Medium*.hh header files.

Fortran strings are not null-terminated, and there is a call to a Fortran subroutine named Magboltz::gasmix_ in line 1567 causing the C-string char name[Magboltz::nCharName]; to not be null-terminated. This specific crash likely happens when the program tries to figure out the length of name in another portion of MediumMagboltz::Mixer. Right after the Magboltz::gasmix_ call, add:

name[Magboltz::nCharName-1] = '\0';

I don’t think this is the source of your original crash, so you should try triggering it again after the fix. @hschindl it might be a good idea to add the above modification to MediumMagboltz.cc and commit it along with Medium.cc and Medium.hh.

I decided to trigger the crash again (run AvalancheMC after AvalanceMicroscopic) and take a better look at the stack trace with debugging flags enabled. The sequence of function calls seems to be AvalancheMC::DriftIonAvalancheMC::TransportParticlesAvalancheMC::GetVelocityMedium::IonVelocity. So far so good, but for some reason m_iVel is always empty in Medium::IonVelocity, probably due to the racing condition. As such, repeated calls to Medium::VelocityFromMobility and Medium::Init are made in all threads. The following fix for Medium.cc is a much better solution than my previous one, as now only one call to Medium::VelocityFromMobility will be made and Medium::Velocity will be parallelized, which should scale into a massive speed up:

bool Medium::IonVelocity(const double ex, const double ey, const double ez,
                         const double bx, const double by, const double bz,
                         double& vx, double& vy, double& vz) {
  std::vector<std::vector<std::vector<double> > > vB;
  std::unique_lock<std::mutex> guard(m_mutex); // Add a lock here instead of in Medium::Init
  if (m_iVel.empty() && !m_iMob.empty()) {
    VelocityFromMobility(m_iMob, m_iVel);
  }
  guard.unlock(); // Allow for multithreaded execution of Medium::Velocity
  return Velocity(ex, ey, ez, bx, by, bz, m_iVel, vB, vB, +1., vx, vy, vz);
}

But perhaps an even better solution would be to add a function similar to MediumMagboltz::Initialise() for extracting the ion velocity from the ion mobility. Something as simple as

bool MediumMagboltz::Initialise(const bool verbose, const bool ion) {
  if (!m_isChanged) {
    if (m_debug) {
      std::cerr << m_className << "::Initialise: Nothing changed.\n";
    }
    return true;
  }
  if (ion) {
    VelocityFromMobility(m_iMob, m_iVel); // ion mobility initialization here
  }
  return Update(verbose);
}

Fixes the problem and avoids the mutex entirely, which is ever so slightly faster as it avoids the std::unique_lock overhead.

Sincerely,
Gabriel

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.