Hello,
I am runnning out of ideas on how to debug the following problem.
I have a TServerSocket that accepts connections from clients. The code is based on the example given here:
root.cern.ch/root/html/examples/hserv.C.html
except that I do not close the TServerSocket (I allow for many clients to connect).
The problem shows up when client and TServerSocket are on different machines, but not when running on the same machine (ie. “localhost”).
When they run on the same machine, the program works as expected. When running on different machines, I manage to exchange a few TMessages, but very soon I’m running into trouble. It looks like TMessage “corruption”, but I don’t know much more than that.
I tried looking at the message type that the client sends (“type sent”) and the one that TServerSocket receives (“type recv”), as well as the total # of bytes received by TServerSocket for that socket ("# of bytes").
This is the order of events when client and TServerSocket run on the same machine:
type sent type recv # of bytes
(a) kMESS_STRING kMESS_STRING 28
(b) kMESS_STRING kMESS_STRING 60
(c) kMESS_STRING kMESS_STRING 110
(d) 10012 10012 118 (Note: "homemade" message/integer)
Some more details (that may or may not be relevant):
TServerSocket connects with client # 2 at this point. There’s a bunch of kMESS_STRING messages exchanges w/o problems. TServerSocket then sends two messages of type “kMESS_STRING” and one message of type “10012” to client #1 w/o problems. Client #1 receives them as expected (confirmed by printouts). At this point, client #1 resumes sending stuff to TServerSocket:
type sent type recv # of bytes
(e) kMESS_STRING kMESS_STRING 139
(f) kMESS_OBJECT kMESS_OBJECT 12286 (Note: contains one 1D + one 2D histograms)
(g) kMESS_STRING kMESS_STRING 12310
(h) kMESS_OBJECT kMESS_OBJECT 15295 (Note: contains four 1D histograms)
When client #1 moves into a different machine, everything works fine (and the same) up to step (f), inclusive. When I try to send (g), I get the following:
(g') kMESS_STRING 65 25602 (with TMessage::GetClass() = 0)
or occasionally
(g'') kMESS_STRING ---- ---- null TMessage
Client #1 keeps sending stuff to TServerSocket w/o any complaints. In case (g’), TServerSocket does not know how to proceed (there is no message with type 65!). In case (g’’), TServerSocket thinks that the client has been disconnected (even though it has not).
Unfortunately, I have not managed to reproduce my problem with a smaller set of macros. Any ideas would be greatly appreciated. In particular, does anything in the functionality of TSocket change when I switch from
TSocket *sock = new TSocket(“localhost”, 9090);
to
TSocket *sock = new TSocket(“mymachine.mydomain”, 9090);
that my code does not take into account?
What else should I be checking?
Thanks a lot!
–Christos
PS which root
/afs/cern.ch/cms/external/lcg/external/root/3.10.02/slc3_ia32_gcc323/root/bin/root