TThreads and TMonitor::Select

Hi,
just a simple question - is there some problem with running TMonitor::Select inside TThread? When I run TMonitor::Select in a standalone code, it works (I cannot catch ctrl-d, but it is another story), in the moment I put it in TThread - it crashes to prompt in a little while, just when it reaches Select few times. The piece of code is like this…

do{
	s = mon->Select(1000);   //  s=-1 if timeout   
     	wait=MyCond.TimedWaitRelative( 300  ) ; 
	printf("%s\n", "." );
      }while(  (mon->IsActive(s)==0)&&(wait!=0) ); // jumpout on wait==0

Is it “thread-safe” problem or what?
Jaromir

select() used in TMonitor::Select() should be thread safe, but I’ve to check in detail if the way it is used in ROOT is also thread safe. Could you provide a small test program exhibiting your problem so we can debug it.

Cheers, Fons.

Sorry for this big delay. Now I have tried a little bit different way, but still I have a problem:
1/ I have started with the example from tutorials and created some ‘server’ code. I run it from CINT and it works a similar way as ‘nc’ (tested with nc client), no visible problem.
2/ I have plugged this part into some testing TThread frame, and here it is (as simplified as I could):

#include <stdio.h>
#include "TSocket.h"   //net thread
#include "TServerSocket.h"
#include "TMonitor.h"
#include "TThread.h"   // CINT? //  -lThread

#include <iostream>
#include <fstream>    
using namespace std;

const int MAXTHREADS=10;

//---------------------------------this part works quite fine when run from CINT alone--
void net_server( void *arg )  
{
    int port=9999;
  while( 1==1 ){
    printf("%s\n" ,"entered main while loop in net_server");
   TServerSocket *ss = new TServerSocket(port, kTRUE);
   TMonitor *mon = new TMonitor;
   mon->Add(ss);
   TSocket *s0 = 0, *s1 = 0, *s2 = 0;
   int wait=1; //EXTRA
   while (1) {
      TSocket  *s;
     do{ s = mon->Select( 1000 ); printf("%s",".\n" );}while(  (int)s==-1 );

      if (s->IsA() == TServerSocket::Class()) {
         if (!s0) {
           s0 = ((TServerSocket *)s)->Accept();
            mon->Add(s0);
         } else if (!s1) {
	   printf("I am in s1 :%s\n","" );
            s1 = ((TServerSocket *)s)->Accept();
            mon->Add(s1);
         }  
        continue;
      }//if  IsA

      char aaa[1000]; int get;
       char newline='\n';
     get=s->RecvRaw( aaa , 1000, kDontBlock);
      printf("Client %d: get==%d:           <%s>\n", s==s0 ? 0 : 1, get, aaa );
     aaa[get]='\0';
      printf("Client %d: get==%d:           <%s>\n", s==s0 ? 0 : 1, get, aaa );

      if (get==0){
         mon->Remove(s);
	 s0=NULL; // one client is fine for me (i can do two), here I recycle the s0 socket
      }// get==0
      if (strstr(aaa,"kill")!=0){
         printf("in a kill : active==%d\n", mon->GetActive() );
	 mon->Remove(s);// this is removed. others not
         mon->Remove(ss); ss->Close();
         printf("in a  kill : active==%d\n", mon->GetActive() );
      }
      if (mon->GetActive() == 0) {
            printf("No more active clients... stopping\n");
            break;
      }
   }//while (1)
  } // BIG  WHILE 1==1

 printf("%s","net_server stop \n");
 return ;
}// .............................................................net_server end



// ------------------------------MAIN----------------------
void testbench(){

TThread *shspe_threads[MAXTHREADS];

    shspe_threads[1] = new TThread( "my_server" , net_server, NULL );
    if (shspe_threads[1]==NULL){ printf("exiting, thread 1 not running\n%s","");return ;}
    shspe_threads[1]->Run();

}// .....................................................................testbench

3/ I do
root -n
.L testbench.C+
testbench()
and … a result - even without sending data from outside (‘nc’) - it crashes in a short time when I play with enter in cint…
I have observed that do-while loop (that with ->Select, I fprint ‘.’ there) stops sometimes after enter, sometimes starts again. I see some TTermInputHandler:

======= Backtrace: ========= /lib/i686/cmov/libc.so.6(+0x6b381)[0xb65dc381] /lib/i686/cmov/libc.so.6(+0x6cbd8)[0xb65ddbd8] /lib/i686/cmov/libc.so.6(cfree+0x6d)[0xb65e0cbd] /usr/lib/libstdc++.so.6(_ZdlPv+0x21)[0xb67d1701] /usr/lib/libstdc++.so.6(_ZNSs4_Rep10_M_destroyERKSaIcE+0x1d)[0xb67ad6bd] /usr/lib/libstdc++.so.6(_ZNSs6assignERKSs+0xa0)[0xb67af1d0] /home//root/lib/root/libCore.so(_ZN9textinput9TextInput9TakeInputERSs+0x38)[0xb7289b98] /home//root/lib/root/libCore.so(Getlinem+0x50e)[0xb727d5ce] /home//root/lib/root/libRint.so(_ZN5TRint15HandleTermInputEv+0x46)[0xb683f7e6] /home//root/lib/root/libRint.so(_ZN17TTermInputHandler6NotifyEv+0x25)[0xb683df55] /home//root/lib/root/libRint.so(_ZN17TTermInputHandler10ReadNotifyEv+0x14)[0xb6840de4] /home//root/lib/root/libCore.so(_ZN11TUnixSystem16CheckDescriptorsEv+0x177)[0xb725d127] /home//root/lib/root/libCore.so(_ZN11TUnixSystem16DispatchOneEventEb+0xf8)[0xb725d2f8] /home//root/lib/root/libCore.so(_ZN7TSystem9InnerLoopEv+0x24)[0xb71c8474] /home//root/lib/root/libCore.so(_ZN7TSystem3RunEv+0x89)[0xb71cb369] /home//root/lib/root/libCore.so(_ZN12TApplication3RunEb+0x37)[0xb715a5a7] /home//root/lib/root/libRint.so(_ZN5TRint3RunEb+0x28d)[0xb684078d] /home//root/bin/root.exe(main+0x6f)[0x8048eaf] /lib/i686/cmov/libc.so.6(__libc_start_main+0xe6)[0xb6587ca6] /home//root/bin/root.exe[0x8048d71]

Any idea? Please…
Best regards, jaromir

Unfortunately I am not able to reproduce the crash.

root [0] .L testbench.C+
Info in <TUnixSystem::ACLiC>: creating shared library /Users/anar/./testbench_C.so
In file included from /Users/anar/testbench_C_ACLiC_dict.cxx:17:
In file included from /Users/anar/testbench_C_ACLiC_dict.h:34:
/Users/anar/./testbench.C:41:13: warning: unused variable 'newline' [-Wunused-variable]
       char newline='\n';
            ^
/Users/anar/./testbench.C:23:8: warning: unused variable 'wait' [-Wunused-variable]
   int wait=1; //EXTRA
       ^
/Users/anar/./testbench.C:22:31: warning: unused variable 's2' [-Wunused-variable]
   TSocket *s0 = 0, *s1 = 0, *s2 = 0;
                              ^
/Users/anar/./testbench.C:14:24: warning: unused parameter 'arg' [-Wunused-parameter]
void net_server( void *arg )  
                       ^
4 warnings generated.
root [1] testbench()
entered main while loop in net_server
root [2] .
.
.
.
.
.
.
.
(............)

Can you try to run in DEBUG to see maybe we can get a bit more extended backtrace?

Additionally, I would recommend to clean the code. For example you never check whether sockets you add to select are valid ones. Check also how you close and delete sockets. Looks to me that you somehow randomly decide whether to close sockets, delete object or not.
For example here, I think, you leak a handle.

   mon->Remove(s);
    s0=NULL;

If you don’t block reading, then make sure to read everything. But not just assume that one call to RecvRaw will get everything from the socket. I mean, there is a need to carefully revise the code.

I also fail to understand why you want to use select in multiple threads. I doubt that making your server multi-threaded that way will bring any advantage.
I would advise to implement only ONE “select” in a main thread (or a “poll” call, if there will be more than 1000 handles to wait). It can be also done in a monitoring thread, but then I would make select wait infinitely to reduce CPU load.
All communication with clients I would put in different threads (preferable a TThreadPool).
In this case your server will perfectly manage communication with tens thousands of clients.

If you choose to go with multiple select calls, I would then expect nothing else but a synchronization nightmare.

[quote=“jaromrax”]Hi,
just a simple question - is there some problem with running TMonitor::Select inside TThread?[/quote]
No, there should be no problems, as long as you don’t wait (“select’ing”) same descriptors in different threads.

Thanks for checking it, but from your 1st ‘code’ section it seems that you didnt try to press enter several times (5-20).

I have it running for 2 days, but when I start with hitting enter it crashes in ten seconds (especially when I keep it pressed, it is immediate).

This I agree. I admit, that I feel not very confident about sockets etc. Maybe this is a big problem too. A touch of an expert would make a big difference, no doubts… However, I dont believe that this is the actual problem. When I put fflush(stdout); and printouts everywhere, I see that I crash and dont even exit the do-while loop:

do{ s = mon->Select( 1000 ); printf("%s",".\n" );fflush(stdout);}while( (int)s==-1 ); printf("%s","out of select\n" );fflush(stdout);

[quote=“anar”]
I also fail to understand why you want to use select in multiple threads. I doubt that making your server multi-threaded that way will bring any advantage.
I would advise to implement only ONE “select” in a main thread (or a “poll” call, if there will be more than 1000 handles to wait). It can be also done in a monitoring thread, but then I would make select wait infinitely to reduce CPU load.
All communication with clients I would put in different threads (preferable a TThreadPool).
In this case your server will perfectly manage communication with tens thousands of clients.

If you choose to go with multiple select calls, I would then expect nothing else but a synchronization nightmare.[/quote]
Well this is another discussion. Probably you are right, but my aim is not to serve to 10k clients.

Briefly - I want a server for one client at a time. But the server must be able to respond to some ‘stop’ signal and the whole thread must quit completely. When I used the other example in tutorials, the server was stuck in listen and I was not able to quit it. ->Select seemed me fine, as there I can always check the incoming signals.
And I need to have CINT in the same time. Server just must run in the background.

Best regards
jaromir

There are several technics how one can achieve that. Really many.
For example:

  1. You can create a pipe and give handle of this pipe to all select’ors (if you have many). As soon as you want to stop, just write something into this pipe and all selectors will wake up and read. So, in threads you can then check whether it was a pipe’s handle signaled and stop.

  2. Use thread condition variables. Posix, BOOST or ROOT also implements it.

  3. Use signals.

  4. a static var.

… + many more other, depends on design of application.

I personally prefer a pipe way. Because my select(s) is usually waiting infinitely. So I need something to wake him up. But Like I said, it depends on the design. For example when I developed a thread pool class for my application or TTreadPool for ROOT, I intensively used Thread Condition Variables - a very nice stuff.

Thanks again, but there is still a question - did you try to keep pressed to reproduce the crash? I have it in 100% cases within few seconds and I see _ZN8TMonitor6SelectEl+0x11f and _ZN5TRint15HandleTermInputEv+0x46 words that sounds to me like some deep interaction in root, not really my fault (I am affraid).

As far as I can follow you - you speak about possibilities with a Select function, but this is exactly a point of my crash…
I already tried to use directly this

TServerSocket *ss = new TServerSocket(port, kTRUE); TSocket *s0 = 0; s0=ss->Accept();
(but it blocks the thread==I cannot interrupt this from another thread, however, no crashes)

I went through your suggestions, I think I cannot use pipe, because I need the main thread really isolated. (2): condition variables are completely fine for me, they already work.

This is shortest crash example I am able to prepare:

#include "TSocket.h"   
#include "TServerSocket.h"
#include "TMonitor.h"
#include "TThread.h"  

void* net_server( void* arg )  
{
    int port=9999;
//===================================================================
   printf("%s\n" ,"entered main while loop in net_server. Dont send anything just keep <enter>");
   TServerSocket *ss = new TServerSocket(port, kTRUE);
   TMonitor *mon = new TMonitor;
   mon->Add(ss);
   TSocket *s0 = 0;
     TSocket  *s=NULL;
     mon->ResetInterrupt();
     s=mon->Select( );
     printf("%d:%s",(int)s,"...dies before exiting Select() \n" );fflush(stdout);

//===================================================================
}



void testbench2(){
    TThread* thread;
    thread = new TThread( "my_server" , net_server, NULL );
    if (thread==NULL){ printf("exiting, thread 1 not running\n%s","");return ;}
    thread->Run();
}// .....................................................................testbench2

Even completely without timeout.

[quote=“jaromrax”]This is shortest crash example I am able to prepare:

#include "TSocket.h"   
#include "TServerSocket.h"
#include "TMonitor.h"
#include "TThread.h"  

void* net_server( void* arg )  
{
    int port=9999;
//===================================================================
   printf("%s\n" ,"entered main while loop in net_server. Dont send anything just keep <enter>");
   TServerSocket *ss = new TServerSocket(port, kTRUE);
   TMonitor *mon = new TMonitor;
   mon->Add(ss);
   TSocket *s0 = 0;
     TSocket  *s=NULL;
     mon->ResetInterrupt();
     s=mon->Select( );
     printf("%d:%s",(int)s,"...dies before exiting Select() \n" );fflush(stdout);

//===================================================================
}



void testbench2(){
    TThread* thread;
    thread = new TThread( "my_server" , net_server, NULL );
    if (thread==NULL){ printf("exiting, thread 1 not running\n%s","");return ;}
    thread->Run();
}// .....................................................................testbench2

Even completely without timeout.[/quote]
thank you, I’ll check it out.

Hi, jaromrax

try to use

TList readClients;
mon->Select(&readClients, NULL, 10);

instead of the

then you can loop the readClients and read from them. if you want to write to the clients socket you should replace the NULL with another TList of writeClients.

If you read the TMonitor source code you will find out the mon->Select() indeed does a lot of checkings on the X windows signals and other signals that will interfere with the interactive prompt of the rootcint. The mon->Select(readreadyptrs, writereadyptr, timeout) will not do those checks so it won’t cause the problem.

It took me a whole week to figure out this little trick with root 5.28, i assume it still exists in 5.3*.

This is an old post. So hopefully it will be useful in your case if you are still struggling with it :smiley:, and for other ppl, too.