Root 6.9.2 code crash on one ubuntu, ok on another

Hello,
strange this happened to me - two “identical” Ubuntu installations, same root from git, same compilation process, same .rootrc, same source code of my program. My program crashes at the very end on one Ubuntu, while is ok on the other.

Earlier I was looking forward TMapFile in root6, this apparently happened in v6-09-02. Thanks. Now I can compile my code with threads and TMapFile on Ubuntu 16.04. I did it on Lenovo notebok and small formfactor Gigabyte PC, i5 and i7 cpus.

Both computers have he same git cloned root tag v6-09-02

Both have: Linux ___ 4.4.0-66-generic #87-Ubuntu SMP Fri Mar 3 15:29:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Lenovo says Ubuntu 16.04.1, Gigabyte says 16.04.2 (very new install from an older ISO)

At the program end, where threads close and join in master, program is supposed to end, Gigabyte PC crashes.

#0  0x00007f161e22751b in __GI___waitpid (pid=5424, stat_loc=stat_loc
entry=0x7ffdbb18b680, options=options
entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:29
#1  0x00007f161e1a0fbb in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:148
#2  0x00007f161de0e09d in TUnixSystem::Exec (shellcmd=<optimized out>, this=0xfb7578) at /home/ojr/02_GIT/root/core/unix/src/TUnixSystem.cxx:2119
#3  TUnixSystem::StackTrace (this=0xfb7578) at /home/ojr/02_GIT/root/core/unix/src/TUnixSystem.cxx:2413
#4  0x00007f161de1069c in TUnixSystem::DispatchSignals (this=0xfb7578, sig=kSigSegmentationViolation) at /home/ojr/02_GIT/root/core/unix/src/TUnixSystem.cxx:3633
#5  <signal handler called>
#6  0x000000000004e166 in ?? ()
#7  0x00007f161dc78570 in (anonymous namespace)::R__ListSlowClose (files=0xfbd6d8) at /home/ojr/02_GIT/root/core/base/src/TROOT.cxx:1036
#8  0x00007f161dc791b0 in TROOT::CloseFiles (this=0x7f161e127b00 <ROOT::Internal::GetROOT1()::alloc>) at /home/ojr/02_GIT/root/core/base/src/TROOT.cxx:1123
#9  0x00007f161e19636a in __cxa_finalize (d=0x7f161e124540) at cxa_finalize.c:56
#10 0x00007f161dc77a43 in __do_global_dtors_aux () from /home/ojr/root/lib/libCore.so
#11 0x00007ffdbb18e350 in ?? ()
#12 0x00007f1621a56c17 in _dl_fini () at dl-fini.c:235
===========================================================

While Lenovo is fine, normal end. I checked libdl.so on both PC’s, they look the same. My compiled program was crashing on Debian Jessie immediately, so I switched to Ubuntu.

I dont want to waste your time, but do you see some hint? Where should I go? Verify librarie’s checksums? root binaries?

Hi,

what is the code that crashes?

D

It is a threaded code. I use this to iniitate a thread:

typedef struct{    /* Used as argument to thread_start() */
     pthread_t thread_id;        /* ID returned by pthread_create() */
     int       thread_num;       /* Application-defined thread # */
 } thread_info;
thread_info *tinfo;
...
...
if (tinfo[tnum].thread_num<0){ 
	   tinfo[tnum].thread_num = tnum + 1;
	   //	   if (tnum==0){
	   pth_res = pthread_create(&tinfo[tnum].thread_id, NULL,
				    &loop_thread,  &tinfo[tnum] );
	   //	   }
	   //	   if (tnum==1){
	   //	   pth_res = pthread_create(&tinfo[tnum].thread_id, NULL,
	   //				    &loop_thread2,  &tinfo[tnum] );
	   //	   }
	   if ( pth_res != 0){printf("pthread_create failed%s\n","");return 1;}
	 }

This to join:

 for (int tnum = 0; tnum < NTHREADS; tnum++) {
   printf("[##M##]... waiting thread %d finish ...\n", tnum+1);fflush(stdin);
   int pth_res = pthread_join(tinfo[tnum].thread_id, NULL );
   if (pth_res != 0){printf("pthread_join failed%s'\n","");
     //exit(1); // if no thread => quit to socat doesnot work
   }
   printf("[##M##] master join : thread %d (of %d) arrived\n",
	  tnum+1, NTHREADS);
   //	  tinfo[tnum].thread_num, NTHREADS);
   fflush(stdin);
   //	       printf("thread %d/%d\n", tnum+1, NTHREADS);fflush(stdin);
 }
 if (tinfo!=NULL){ free(tinfo);}

And inside the function (measuring thread) if communicate using mmap. I need to use rootn.exe when I do such a thing in CINT.

I know, it is not a working proof, but I was busy and I didnt manage to clear extract the code. I just reinstalled root and verified that I compile with v06-09-02 also on Lenovo. I would need some more time…

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Hi,

you may try with ROOT 6.10: it’s out.
My advice would be also to use some higher level abstraction such as TThreadExecutor: https://root.cern.ch/doc/v610/classROOT_1_1TThreadExecutor.html#ab3189ed33012ce13cc3db3f9a819a173
Here some tutorials:
https://root.cern.ch/doc/v610/group__tutorial__multicore.html

Cheers,
D