TSocket/TPSocket: Send more data than send/recv buffersizes

augustin · May 4, 2015, 1:19pm

Hi,

I’m currently trying to amend a problem we are having with a piece of code that sends a raw data stream through a TSocket from a DAQ-PC to a PC where the data is analysed/displayed.

To do so, we are creating a buffer as char-array with the length prepended, and send the whole thing through TSocket::SendRaw(). On the other hand the length is read, a buffer is allocated, and then the remaining data is read.

After some analysing, I found that a problem occurs when the length of the buffer is larger than the send-/recv-buffer of the socket (hence also my other question here). In particular, data that does not fit into the send-/recv-buffer simply disappears… As far as I understand, this is the “correct” behaviour of TCP, however I obviously need to work against it. Strangely enough, it seems that this does not trigger the “would not block” return value (-4) of SendRaw()/RecvRaw().

When I first found ROOT’s parallel sockets TPSocket, I thought they should remedy the problem, as I was suspecting there would be some control mechanism in place to check if one of the parallel buffer is filled completely. However, this seems not to be the case. I simply increased the buffer size in the first line of the pclient.C example and found the same data loss as in the non-parallel case:

Processing pserv.C... Receive 100 buffers of 50000000 bytes over 5 parallel sockets... 705032704 bytes received in 2.869882 seconds 234.16 MB/s

The second line should be reporting 5000000000 bytes …

This works however with a ten times smaller buffer:

Processing pserv.C... Receive 100 buffers of 5000000 bytes over 5 parallel sockets... 500000000 bytes received in 0.500118 seconds 951.78 MB/s

To cut a long story short:

1.) Now, I’m wondering whether this is the intended behaviour of TPSocket? I had suspected that the data is sliced such that each slice fits into the respective send-/recv-buffers.

2.) What would be an advisable/reliable way to get this working?
I have come up with a function SendChunks() that cuts the buffer into pieces small enough to fit into the socket’s buffers, and then sends them piece by piece. However, I’m not sure how often I need to check for the buffer sizes, in order to stay in sync with TCP’s flow control…

3.) As mentioned above, how does one know when the data loss occurs, if the return value tells that everything worked fine?

Thanks in advance!
Sven

ganis · May 4, 2015, 2:00pm

Hi Sven,

The value printed out (705032704) is an artefact of not sufficient precision: if you cast (niter*bsize) to ‘long long’ and format with ‘%lld’ you should get the right value.

This said, for TCP you should not get data loss (this is also what your link says, if I understand correctly).
Could you please check if using 64 bits integers solve your problem?

G Ganis

augustin · May 4, 2015, 4:08pm

Hi!

Oh sorry, this is a stupid error on my side.

That’s what I thought, too. However, I seem to fail at getting my code fixed and everything points to this being the problem. The sentence I was referring to is the following:

However, the “not acknowledge them” part should produce an error, right?

I’ll try to make a minimal example in which my problem occurs and post it ASAP.

Thank you so far, nevertheless!
Sven

ganis · May 5, 2015, 8:38am

Hi,

[quote=“augustin”]TCP will discard incoming datagrams and not acknowledge them, triggering congestion control (halving the window size under Reno/XP or switching to the delay-window under CompoundTCP/Vista+).

However, the “not acknowledge them” part should produce an error, right?

[/quote]
Yes, but thinking again, this perhaps is the behaviour under windows. In TCP, if the sockets are in blocking mode, sending should just block if the receiver does not read out the data. I mean, if the receive buffer server side is full, the sender should not be allowed to send any more until the receiver gets what was sent before.
Attached is a modified version of pserv.C which waits before receiving: this seems to work, even sending large amounts of data, e.g. 1000 buffers of 50000000 bytes each .

Do you have an example where you have a data loss with no error?

G Ganis
pserv.C (1.81 KB)

augustin · May 5, 2015, 11:44am

Hi!

This is how I understood it as well … however, not being able to fix our problem made me less sure about this.

This sleeps between the two receives (1. length, 2. data). However, in my case (see below) there is only one big buffer that is sent, which also contains the length information.

Yes, I tried to strip as much as possible from our code as I could.
Nevertheless, I should explain what’s going on:

recv.cc and send.cc are test programs. There’s a Makefile, too.
Run ./send first, then ./recv.

send.cc uses the “MyDataSender” class, which I took from our code, to send a “MyBuffer”.

I’ve left in the “send everything at once” block, and a “send chunkwise” block (starting from line 100 in MyDataSender.cpp)

In the first case:
If the buffer is too large, it cannot be sent.
This is influenced by the value set for the sendbuffer as can be tested in line 87.
In this case, I get an error as return value… Yet, there’s no canonical way to deal with such a large buffer, it seems.

Therefore, the second case was meant to overcome this limitation:
However, here, the buffer seems to get filled with time, and strange things happen in particular if the chunksize (cs) is larger than the sendbuffer. If it is at least large compared to it, the “filled over time” problem occurs. Note that, I had also added a sleep() in there, but it didn’t really help much.

To see the weird behaviour one can use the hexdiff program:

These are the first 3 lines of the binary dump made in the end of the code:

       0   54 00 50 00 00 00 00 00 01 00 00 00 02 00 00 00    T P
      16   03 00 00 00 04 00 00 00 05 00 00 00 06 00 00 00
      32   07 00 00 00 08 00 00 00 09 00 00 00 0a 00 00 00

A jump to the first difference of send and recv dump shows:

 2582576   0b da 09 00 0c da 09 00 0d da 09 00 0e da 09 00
 2582592   0f da 09 00 10 da 09 00 11 da 09 54 00 50 00 00               T P 
 2582608   00 00 00 01 00 00 00 02 00 00 00 03 00 00 00 04

Here, the second line shows that the datastream simply restarts.
Finally, the point where this occurs is consistent with the size of the sendbuffer, as can be seen in the output of ./send …

I’d be very grateful for some insight into what we are doing wrong, here.

Best regards,
Sven
ds.tar.gz (4.29 KB)

ganis · May 8, 2015, 3:58pm

Hi,
Sorry for replying late.
I had a closer look.
I can reproduce your problem with the code that you posted.
The problem goes away when I remove the kNoBlock setting on the socket open on the server side (by send).

If I keep that setting and go slowly with the debugger in the send part (which means stepping inside UnixSock::SendRaw) it also works.
At normal speed I get 'would block errors) (-4) but the receiver (recv) acts as it has got it all.

I guess the problem is with the logic in the case of ‘no blocking’. One should look at some examples to get it right in all cases. Do you really need to go ‘no blocking’?

Attached is the version I was testing. I added the possibility to pass the length on the command line to test various sizes; you could use the same to extract the length from the beginning of the buffer.

G Ganis
ds-mod.tar.gz (5.82 KB)

augustin · May 11, 2015, 1:04pm

Hi!

First off: Thank you very much for helping with this puzzle.

This is exactly what I have found, too. I also tried to simply add some sleep() in each iteration of the chunkwise sender and found that it will reduce the chance of something going wrong. However, it does not remove it …

I fear I need to have the socket running in non-blocking mode, as if it did block, I could potentially miss the next trigger in the DAQ part of the code. The whole thing is not threaded, thus a blocking socket would indeed block the rest of the program.
What I would want is to drop the buffer that did not arrive completely on the other side, and simply go on with the next one.

An alternative would be to add threading, and terminate the sending thread when the next trigger comes in. I fear this will lead to a very similar outcome, as killing the sending thread will leave the receiver side in an unfinished recv step.

You are probably right, I also have the feeling the problem is where I react to the -4 return value. I have tried to find some examples on this. It seems, however, that it is done exactly this way everywhere.

That’s a very good idea.

However, I seem to run out of ideas on where to look for the problem.
It might bring some insight, to start from a ROOT example and simply add the NoBlock option to the socket. I think I have tried that before, and found it still works in the example. I should try that again … I guess, the shorter the example where the error occurs the easier it will be to figure out what the error actually is.

Best regards,
Sven

augustin · May 26, 2015, 9:04am

Hi!

I’m sadly still stuck with this problem.
I tried my best to debug the code, but failed to find anything so far.

If someone with more insight into non-blocking network IO could point me to an exemplary implementation, I would be very grateful.

Alternatively, if someone could check whether my handling of the “would not block” return value is correct as it is… I’m still seeing this as likely source of the problem.

Thanks in advance!
Sven

tpochep · May 26, 2015, 1:18pm

[quote=“augustin”]Hi!

I’m sadly still stuck with this problem.
I tried my best to debug the code, but failed to find anything so far.

If someone with more insight into non-blocking network IO could point me to an exemplary implementation, I would be very grateful.

Alternatively, if someone could check whether my handling of the “would not block” return value is correct as it is… I’m still seeing this as likely source of the problem.

Thanks in advance!
Sven[/quote]

Sorry, I did not read your code and a bit lost in this long discussion. Sending more data than send/recv buffer sizes is how … any network actually works. For example, as I understand it, you write chunks of data into your socket while ‘select’ with timeout 0 returns you ‘write ready’ for your socket, at some point the buffer is full and select fails; and you ‘block’ in the next select, waiting for ‘write ready’, wake up on ‘write ready’ and fill it again etc (if you have more data).

Obviously, nobody tries to reset the buffer size to fit your data size (please disregard this comment if it’s not what you are trying to do).

I believe R. Stevens has really nice and simple examples in his “Unix Network Programming” - non-blocking client/server apps. All you have to do is to map such an example to the ROOT’s API and this should be quite straightforward, since what ROOT has is quite a thin wrapper above the actual API. Instead of ‘select’ you’ll use TMonitor etc.

Sorry, if I got your problem wrong.

augustin · May 26, 2015, 3:06pm

You are certainly right, this is indeed how all networks work!
I think you mix up blocking and non-blocking IO, though. But I might be wrong …

As I understand the blocking mode works like this:

Simplest case:
You send/recv until the asked for amount of data has travelled through the line. The program waits (i.e., blocks) at this point …

More advanced case:
You use select with a timeout to find whether the socket is ready. This way you can limit how long the waiting above lasts.

In the non-blocking case instead:

You try to send/recv, instead of waiting you get the “would not block” return value (-4), which tells you to try again later. This way you will never wait, but you have to handle the -4 correctly. You’d also never use select.

If this is right so far (I do hope so ), I think there’s a problem in my code with that last part. I guess one should react differently to the -4 … But I can’t figure out how.

On the other hand, maybe using select with a timeout is good enough for me. I’ll try that. What it would do, however, is replacing the built-in non-blocking mechanism (handling -4 correctly) with a self-made one (handling the return values of select). Right?

Thanks for the recommendation. I will look it up.
I fear, however, that this is more or less what is in the code at the moment. A ROOT version of a more or less standard non-blocking example: send/recv until -4, try again later.

Thank you very much!
Sven

tpochep · May 26, 2015, 4:17pm

[quote=“augustin”][quote=“tpochep”]
Sorry, I did not read your code and a bit lost in this long discussion. Sending more data than send/recv buffer sizes is how … any network actually works. For example, as I understand it, you write chunks of data into your socket while ‘select’ with timeout 0 returns you ‘write ready’ for your socket, at some point the buffer is full and select fails; and you ‘block’ in the next select, waiting for ‘write ready’, wake up on ‘write ready’ and fill it again etc (if you have more data).
[/quote]

You are certainly right, this is indeed how all networks work!
I think you mix up blocking and non-blocking IO, though. But I might be wrong …
Sven[/quote]

I’m interested only in non-blocking IO, since IIRC that’s what you were asking about.
You can call select with zero timeout (zero, but not null - have a look at man) - this returns immediately and you can check, if write ready is set for your socket, if yes - you can try to write the next chunk.

If the buffer is filled and you can not write more (and still have some data), you call select with either null as timeout or timeout set to some value (now + some_time) - select will be waiting for some free space in send buffer (and essentially select blocks) - so you’ll either have a timeout (and you’ll have to decide, continue or not) or you’ll have write ready (and try to white the next chunk).

I’ve mentioned the case with zero timeout - as ROOT’s sockets work like this - they DO not return you the size they managed to write, so you have to workaround this.

P.S. I would also recommend you to have look at ROOT’s tutorials: $ROOTSYS/tutorials/net, there is an example (hserv.C and hclient.C) - IIRC they send an object (I guess, using ROOT’s serialization machinery) - probably, this is what you want instead of low-level sockets/select read/write - you can wrap your data in some object of class inherited from TObject and write this object (this is only a guess, I never tried this myself).

augustin · May 26, 2015, 4:40pm

[quote=“tpochep”]I’m interested only in non-blocking IO, since IIRC that’s what you were asking about.
You can call select with zero timeout (zero, but not null - have a look at man) - this returns immediately and you can check, if write ready is set for your socket, if yes - you can try to write the next chunk.[/quote]

Indeed, I understood that.
You are right, I’d like to get this working in non-blocking mode.

However, my point was, and I might be wrong here, what is usually called “non-blocking mode”, is:

Which should not be used with select. Instead the return value -4 has to be handled…

You suggest (and I think its a very good idea!) to set kNoBlock to kFALSE, and use select with a timeout instead.

Did I understand you correctly?

PS: Yes, I have seen those examples. It might indeed be a good idea to use the send/recv wrapped in ROOT objects. I have not tried that so far… However, from the manual I can not see whether any more handling of errors is in those methods…

PPS: I’ll try to find where I read that NoBlock=true should be used without select …

tpochep · May 26, 2015, 5:11pm

[quote=“augustin”][quote=“tpochep”]I’m interested only in non-blocking IO, since IIRC that’s what you were asking about.
You can call select with zero timeout (zero, but not null - have a look at man) - this returns immediately and you can check, if write ready is set for your socket, if yes - you can try to write the next chunk.[/quote]

Indeed, I understood that.
You are right, I’d like to get this working in non-blocking mode.

However, my point was, and I might be wrong here, what is usually called “non-blocking mode”, is:

Which should not be used with select. Instead the return value -4 has to be handled…

You suggest (and I think its a very good idea!) to set kNoBlock to kFALSE, and use select with a timeout instead.

Did I understand you correctly?[/quote]

No, you have to set kNoBlock, otherwise send will block.
It looks like you can use kDontBlock as the third argument for SendRaw (instead of kDefault - the default one) - quite non-obvious I’d say - in this case you’ll have the number of bytes from the first (and the only) send call.

As I said already - you’d better have a look at TSocket::SendObject and overloaded methods in TSocket - this will save you a loooot of time.

P.S. if you have -4 even with kDontBlock, this probably means you call SendRaw the second or n-th time without waiting for write ready (thus too early).

augustin · May 27, 2015, 3:21pm

[quote=“tpochep”]No, you have to set kNoBlock, otherwise send will block.
It looks like you can use kDontBlock as the third argument for SendRaw (instead of kDefault - the default one) - quite non-obvious I’d say - in this case you’ll have the number of bytes from the first (and the only) send call.

P.S. if you have -4 even with kDontBlock, this probably means you call SendRaw the second or n-th time without waiting for write ready (thus too early).[/quote]

Ah, okay. Now I get it. I’ll try to modify my code accordingly.
Thanks again for the suggestions!

Understood. Still: Does SendObject take care of all the buffer pitfalls? The ROOT object could still be very large and not fit into the buffer, etc.

tpochep · May 27, 2015, 6:42pm

Well, people somehow use ROOT and GRID reading huge TTree objects etc. Somehow this works. I do not imply that they call SendObject or sending a TMessage object, but obviously, nobody works directly with SendRaw if there are higher levels of abstraction. Anyway, there is no harm in trying if sending objects will work for you?? And if not while your code is correct - file a bug report!

augustin · May 28, 2015, 8:17am

That is indeed a very good point

Let me say the following in my defence:
I inherited the code from my predecessors and it has worked for many years as it is. When we updated the hardware it runs on it started to make problems. Naturally, I try to make minimal changes to it to get it working again, as it has been tested to be “good enough”. More changes introduce more differences to the known-to-be-working code, and thus might introduce errors.

OTOH, you are perfectly right, I’ll give SendObject a try!

augustin · June 9, 2015, 1:23pm

I’m trying at the moment to switch my sending from SendRaw() to SendObject(), however it seems I fail at packaging my data array as a TObject.

In particular, unexpectedly TArray does not inherit from TObject. Is there a TObject-inheriting class that can hold a Char_t array? However, I could try to roll my own amalgamation of the two…

OTOH, I thought TSocket::Send(TMessage) might be what I want, where the TMessage would hold my data array.

On the sending side, I have:

Int_t sendMsg(TSocket * s, MyBuffer *buffer)
{
	Char_t *b = buffer->getBuffer();

	// the buffer starts with its own length, this should not be part of the TMessage
	// send the length, move buffer pointer to start of data, get the corrected length
	ret = s->SendRaw(b, sizeof(long));
	b = b + sizeof(long);
	Int_t l = buffer->length() - sizeof(long);

	//make a TMessage, fill it, send it
	TMessage * a = new TMessage(kMESS_ANY, l);
	a->WriteArray(b, l);
	Int_t ret = s->Send(*a);

	delete a;
	return ret;
}

On the receiving side, the length is read via RecvRaw(), the correct number arrives here. A Char_t buffer is created, and passed into:

Int_t recvMsg(TSocket * s, Char_t * b, Int_t l)
{
	// make a TMessage, receive content, fill buffer with it
	TMessage * a = new TMessage(kMESS_ANY, l);
	Int_t ret = s->Recv(a);
	a->ReadArray(b);

	delete a;
	return ret;
}

However, here Recv() always returns 0 …
Then the subsequent ReadArray() crashes with a segfault.

What is strange is that a a->BufferSize() after the Recv() also segfaults, while it returns l+8 when called directly after creating a.

tpochep · June 9, 2015, 1:32pm

As I’ve mentioned already (and maybe, even twice?) - have a look at how hserv.C from $ROOTSYS/tutorials/net reads. And if TArray does not inherit TObject, you can create YourClass inheriting TObject and including TArray as data (or inheriting it as the second base class, for example).

augustin · June 9, 2015, 1:36pm

Exactly! That’s what I did …

So, my question is - more or less - if Send() and TMessage aren’t supposed to do this already?!

tpochep · June 9, 2015, 2:41pm

Have you seen the code I’ve mentioned?

Side A:

Create TMessage.
Write your object into this message
Send the message using socket.

Side B:

Receive TMessage.
Read (from this message) the object.