Thread safety v5-34-20

Hello,

Seeing the fixes of ROOT release 5.34.20 concerning thread safety, I have tried again my threaded-code in vain. I still get segmentation violations. I am using gcc-4.9.1 in C++11 mode to compile everything.

I am using one C++ functor object per thread whose “operator()” is supposed to open the very same file – in read mode only – to retrieve several TH2F’s. I have tried pretty much everything, from having the TFile as a member, to reading the file in a separate function to feed every functor object with a vector of TH2F’s. Whenever I use the TH2F with the ProjectionX method, and GetRandom on the projected histogram, I get various segmentation violations. I have tried using mutex locks virtually everywhere without any results.

The problematic code boils down to this :

  • An object implementing the operator ()
struct myStruct{
  
  const unsigned nEvents;
  const vector<TH2F> map;

  myStruct(const unsigned nEvents, const vector<TH2F>& map):nEvents(nEvents),map(map)
  void operator()(){
       
      for(unsigned i = 0; i<nEvents; ++i){

         TH1D proj = *(map.front().ProjectionX("proj",1, 1));
         proj.GetRandom();

      }

   }

}
  • Lauching threads with one “myStruct” object per thread (to use the operator() of “myStruct” )
void launchThreads(vector<myStruct> toDo){

  start a thread per element in "toDo" and join them

}

void main(){

  vector<TH2F> map;
  TFile myFile("myfile.root");
  fill "map" with all the TH2F's in "myFile"

  vector<myStruct> toDo;
  fill "toDo" with contents of "map"
   launchThreads(toDo);

}

Please note that this code runs if I use only thread at a time (instead of launching many functors in many threads) .I have also tried the same code with ROOT 6.02 (thinking Cling would be enough to ensure thread safety of everything).

Thanks in advance for any help,
Valérian

Hi Valerian,

the std::vector::front() method should be thread safe (even if I am having troubles understanding the code you submitted)
cplusplus.com/reference/vector/vector/front/
what is the stack trace of the segmentation fault?
Do you resize the vector somewhere?

Cheers,
Danilo

Hello,

Thank you very much for your answer. I do not resize the vector and the segmentation fault (which really varies from one execution to the other, I will try to quote a few) is really bound to the use of a TH2 and ProjectionX or GetRandom. If I replace the following lines by anything that doesn’t use a the TH2 vector, the code runs fine.

TH1D proj = *(map.front().ProjectionX("proj",1, 1)); proj.GetRandom();

The code also runs fine if use only one thread, so there is no “proper” segmentation violation.

Which part should I clarify in my code ? (I have tried to edit my first post a little).

Hi,

could you post a complete reproducer which could allow others to inspect in the same segfaults?

Cheers,
Danilo

Sure, I have written a very basic code that reproduces the problem (it’s stupid and doesn’t output anything useful, but the problem is the very same as that in my real code).

#include <thread>
#include <vector>
#include <TFile.h>
#include <TH1D.h>
#include <TH2F.h>
#include <TRandom3.h>

using namespace std;

struct myStruct{
  
  const unsigned nEvents;
  const vector<TH2F> map;
  
  myStruct(const unsigned nEvents, const vector<TH2F>& map):nEvents(nEvents),map(map) {}
  void operator()(){
    
    for(unsigned i = 0; i<nEvents; ++i){

      TH1D proj = *(map.front().ProjectionX("proj",1, 1));
      proj.GetRandom();

    }

  };
  
};

template <class T> void launchThreads(vector<T>& workers){

  unsigned nThreads = thread::hardware_concurrency();//get the number of working cores
//   unsigned nThreads = 1;
  if(nThreads == 0) nThreads += 1;
  vector<thread> threads(nThreads);
  
  auto itWk = workers.begin();
  while(itWk != workers.end()){
    
    auto itTh = threads.begin(); 
    while(itTh != threads.end() && itWk != workers.end()){

      *itTh = thread(*itWk);//start each threads
      ++itTh;
      ++itWk;
      
    }
    
    for(thread& th : threads) th.join();//join them all to the current thread
    
  }
  
}

void generateLiBranches(const unsigned nEvents = 1e4){
  
  vector<myStruct> workers;//speed up the branch generations with a thread per branch
    
  vector<TH2F> betaMapLi (5);
  TFile LiFile("spectres_beta/cartes_li9.root");
  betaMapLi[0] = *static_cast<TH2F*>(LiFile.Get("carte_0"));
  betaMapLi[1] = *static_cast<TH2F*>(LiFile.Get("carte_1"));
  betaMapLi[2] = *static_cast<TH2F*>(LiFile.Get("carte_2"));
  betaMapLi[3] = *static_cast<TH2F*>(LiFile.Get("carte_3"));
  betaMapLi[4] = *static_cast<TH2F*>(LiFile.Get("carte_4"));
  

  workers.push_back(myStruct(nEvents, betaMapLi)); 
  workers.push_back(myStruct(nEvents, betaMapLi)); 
  workers.push_back(myStruct(nEvents, betaMapLi));
  workers.push_back(myStruct(nEvents, betaMapLi)); 
  workers.push_back(myStruct(nEvents, betaMapLi)); 
  workers.push_back(myStruct(nEvents, betaMapLi)); 
  workers.push_back(myStruct(nEvents, betaMapLi)); 

  launchThreads(workers);
  
}

int main(int argc, char* argv[]){
  
  if(argc == 2) generateLiBranches(stod(argv[1]));
  else generateLiBranches();
  return 0;

}

I have also attached an archive that contains the root file to read in the code, a simple Makefile and the source.

Please note that the code runs fine if one uncomments the line containing “unsigned nThreads = 1”.
Reproducer.tar.bz2 (742 KB)

Hi Valerian,

i tried your reproducer and could reproduce the bug:

(gdb) info thr
  Id   Target Id         Frame 
  4    Thread 0x7fffedf10700 (LWP 11076) "reproducer.exe" TH2::GetBinContent (this=0x18e5e30, binx=287, biny=0)
    at root/hist/hist/inc/TH2.h:91
  3    Thread 0x7fffee711700 (LWP 11075) "reproducer.exe" 0x00007ffff7035a5a in TH1::GetBinError (this=0x7fffe8000920, bin=518)
    at root/hist/hist/src/TH1.cxx:8148
  2    Thread 0x7fffeef12700 (LWP 11074) "reproducer.exe" 0x00007ffff6f0e181 in TAxis::GetBinCenter (this=0x7fffe8000998, bin=<optimized out>)
    at root/hist/hist/src/TAxis.cxx:420
* 1    Thread 0x7ffff7fd4980 (LWP 11047) "reproducer.exe" TList::Remove (this=0x1848de0, obj=<optimized out>)
    at root/core/cont/src/TList.cxx:699
(gdb) bt
#0  TList::Remove (this=0x1848de0, obj=<optimized out>) at root/core/cont/src/TList.cxx:699
#1  0x00007ffff7a2370f in THashList::Remove (this=0x1848de0, obj=0x1d59b80) at root/core/cont/src/THashList.cxx:296
#2  0x00007ffff701e539 in TH1::~TH1 (this=0x1d59b80) at root/hist/hist/src/TH1.cxx:610
#3  0x0000000000403b49 in void launchThreads<myStruct>(std::vector<myStruct, std::allocator<myStruct> >&) ()
#4  0x0000000000402dc1 in generateLiBranches(unsigned int) ()
#5  0x00000000004023d5 in main ()

You can get rid of it “decoupling” the histograms from the current directory with

TH1::AddDirectory(kFALSE);

See here for more details: root.cern.ch/drupal/content/hist … -directory

About your program: I find it well done, but wouldn’t a thread pool with a common thread safe workqueue be more efficient?

Cheers,
Danilo

Hi Danilo,

Thank you very much for your support.

I knew the TH1 trick, but it leads to memory leaks (which you can easily spot when you change nEvents to 1e6 for instance).

TH1::AddDirectory(kFALSE);

And I sometimes get messages such as :

terminate called after throwing an instance of 'std::system_error'

I probably have to look more into thread pools, but I would like to fix this thread safety issue first :smiley: !

Hi,

what memory leaks?
I did not encounter this second issue: could you trace down the cause with gdb? Since it’s intermittent, how often does it occur?

Danilo

Well actually, if you set nEvents to high values and check your RAM usage you will see it sky-rocketing, and the

happens just after the RAM is full and the program is killed by the OS. So I guess this error is due to the system killing the program filling up the RAM.

Regards,
Valérian