I am attempting to run a fitter that uses Minuit2 for a minimizer. I have recompiled ROOT/Minuit2 with:
set, and checked to ensure that the -fopenmp cxx flags were set in the makefile, so I am fairly certain it was compiled correctly. However, when I submit jobs, no matter how many cores I allocate to the job, it only uses one core.
I have also tried adding openmp #programa statements in other places in my code, and have shown that when implemented the submitted jobs run on the number of cores specified by OMP_NUM_THREADS. This means that openmp is working and set up correctly in my submission scripts.
The openmp statements in my code speed up the jobs at first, but as the number of Minuit2 iterations increases the time spent in Minuit2 calculating the gradient increases, and number of effective cores being used goes down. So in order to get better speedup, and to fully utilize the allocated cores I would like to get the Minuit2 parallelization working too.
Any ideas on what I am doing wrong, or how I can check where things are failing?