Don’t worry, despite the title I haven’t reverted to math nerd mode. The topic is not number theory but socket programming and a coding mistake that I have seen way too often.
The code fragment in figure 1 demonstrates the mistake. There is a main while loop that loops forever calling select and waiting for characters to be available to be received. Once select indicates that characters are available the code goes through another loop until it has received 10 characters. After the recv function is called the code correctly checks for a 0 return which indicates that the remote peer has closed the socket. It then checks if errno is 0 and if it is it concatenates the characters just received into an application buffer and adds the number of characters just received to the total received for the current message. Finally, it checks errno for the value EWOULDBLOCK and if it is something else it exits the program with an error. At this point the inner loop completes and if the number of characters in the message is less than 10 it calls recv again.
while (1)
{
FD_ZERO (&fdsetREAD);
FD_ZERO (&fdsetNULL);
FD_SET (sockAccepted, &fdsetREAD);
iNumFDS = sockAccepted + 1;
/* wait for the start of a message to arrive */
iSelected = select (iNumFDS,
&fdsetREAD, &fdsetNULL, &fdsetNULL, &timevalTimeout);
if (iSelected < 0) /* Error from select, report and abort */
{
perror ("minus1: error from select");
exit (errno);
}
/* select indicates something to be read. Since there is only 1 socket
there is no need to figure out which socket is ready. Note that if
select returns 0 it just means that it timed out, we will just go around
the loop again.*/
else if (iSelected > 0)
{
szAppBuffer [0] = 0x00; /* "zero out" the application buffer */
iTotalCharsRecv = 0; /* zero out the total characters count */
while (iTotalCharsRecv < 10) /* loop until all 10 characters read */
{ /* now read from socket */
iNumCharsRecv = recv (sockAccepted, szRecvBuffer,
10 - iTotalCharsRecv, 0);
if (iDebugFlag) /* debug output show */
{ /* value returned from recv and errno */
printf ("%d %d ", iNumCharsRecv, errno);
if (iNumCharsRecv > 0) /* also received characters if any */
{
szRecvBuffer [iNumCharsRecv] = 0x00;
printf ("[%s]n", szRecvBuffer);
}
else printf ("n");
}
if (iNumCharsRecv == 0) /* If 0 characters received exit app */
{
printf ("minus1: socket closedn");
exit (0);
}
else if (errno == 0) /* if "no error" accumulate received */
{ /* chars into an applictaion buffer */
szRecvBuffer [iNumCharsRecv] = 0x00;
strcat (szAppBuffer, szRecvBuffer);
iTotalCharsRecv = iTotalCharsRecv + iNumCharsRecv;
szRecvBuffer [0] = 0x00;
}
else if (errno != EWOULDBLOCK) /* Ignore an EWOULDBLOCK error */
{ /* anything else report and abort */
perror ("minus1: Error from recv");
exit (errno);
}
if (iDebugFlag) sleep (1); /* this prevents the output from */
} /* scrolling off the window */
sprintf (szOut, "Message [%s] processedn", szAppBuffer);
if (iDebugFlag) printf ("%sn", szOut);
if (send (sockAccepted, szOut , strlen (szOut), 0) < 0)
perror ("minus1: error from send");
}
}
|
Figure 1 – fragment of incorrect code |
Figure 2 shows an example session. Characters sent to the server are highlighted in yellow, the processed message that is returned is not highlighted. The characters sent include a terminating new line character and are sent in 1 TCP segment. Everything works when exactly 10 characters are sent. But when only 6 characters are sent in a TCP segment the server stops responding.
123456789 Message [123456789 ] processed abcdefghi Message [abcdefghi ] processed 12345abcd Message [12345abcd ] processed 12345 789 abcdefghi 123456789 |
Figure 2 – client session |
Figure 3 shows the server session with debug turned on. You can see that after the “12345<new line>” characters are received the next recv returns -1 and sets the errno to 5011, which is EWOULDBLOCK. The code then loops and the next recv returns the characters “789<new line>” but the errno value is still set to 5011. In fact every recv after that regardless of whether there are characters received or not has errno set to 5011.
Connection accepted 10 0 [123456789 ] Message [123456789 ] processed 10 0 [abcdefghi ] Message [abcdefghi ] processed 10 0 [12345abcd ] Message [12345abcd ] processed 6 0 [12345 ] -1 5011 4 5011 [789 ] -1 5011 4 5011 [abcd] 4 5011 [efgh] 4 5011 [i 12] 4 5011 [3456] 4 5011 [789 ] -1 5011 -1 5011 -1 5011 |
Figure 3 – server debug output |
Because the errno value is not 0 the received characters are not concatenated into the application buffer so the code loops forever.
This is not a bug in the socket code. The socket API explicitly states that the value of errno is undefined unless the function returns a value of -1. Undefined means that the value is not set, so errno retains whatever value it previously had.
Now you might be thinking that no one would break a 10 character message up into 2 pieces and you might be correct; but imagine that instead of 10 characters the message length is 100 or 1000 characters. Also remember that TCP is a stream of bytes not messages; a TCP stack may split an application message up into multiple TCP segments whenever it wants. Certain conditions make this more likely, longer application messages, sending another application message before a previous one has been transmitted, and lost TCP segments are the ones that come readily to mind. Under the right conditions it is possible, even likely, that this server code would pass all its acceptance tests and run fine in a production environment, at least for a while.
The good news is that there is a very simple fix; instead of testing for errno == 0 just test for a return value greater than 0 , see the highlighted change in figure 4. Note also that the comment for the “errno != EWOULDBLOCK” test now points out that the only way to reach that if statement is if recv returned a negative value. The only negative value it returns is -1.
while (1)
{
FD_ZERO (&fdsetREAD);
FD_ZERO (&fdsetNULL);
FD_SET (sockAccepted, &fdsetREAD);
iNumFDS = sockAccepted + 1;
/* wait for the start of a message to arrive */
iSelected = select (iNumFDS,
&fdsetREAD, &fdsetNULL, &fdsetNULL, &timevalTimeout);
if (iSelected < 0) /* Error from select, report and abort */
{
perror ("minus1: error from select");
exit (errno);
}
/* select indicates something to be read. Since there is only 1 socket
there is no need to figure out which socket is ready. Note that if
select returns 0 it just means that it timed out, we will just go around
the loop again.*/
else if (iSelected > 0)
{
szAppBuffer [0] = 0x00; /* "zero out" the application buffer */
iTotalCharsRecv = 0; /* zero out the total characters count */
while (iTotalCharsRecv < 10) /* loop until all 10 characters read */
{ /* now read from socket */
iNumCharsRecv = recv (sockAccepted, szRecvBuffer,
10 - iTotalCharsRecv, 0);
if (iDebugFlag) /* debug output show */
{ /* value returned from recv and errno */
printf ("%d %d ", iNumCharsRecv, errno);
if (iNumCharsRecv > 0) /* also received characters if any */
{
szRecvBuffer [iNumCharsRecv] = 0x00;
printf ("[%s]n", szRecvBuffer);
}
else printf ("n");
}
if (iNumCharsRecv == 0) /* If 0 characters received exit app */
{
printf ("minus1: socket closedn");
exit (0);
}
else if (iNumCharsRecv > 0) /* if no error accumulate received */
{ /* chars into an applictaion buffer */
szRecvBuffer [iNumCharsRecv] = 0x00;
strcat (szAppBuffer, szRecvBuffer);
iTotalCharsRecv = iTotalCharsRecv + iNumCharsRecv;
szRecvBuffer [0] = 0x00;
}
else if (errno != EWOULDBLOCK) /* if we get here iNumCharsRecv */
{ /* must be -1 so errno is defined */
perror /* Ignore an EWOULDBLOCK error */
("minus1: Error from recv"); /* anything else report */
exit (errno); /* and abort */
}
if (iDebugFlag) sleep (1); /* this prevents the output from */
} /* scrolling off the window */
sprintf (szOut, "Message [%s] processedn", szAppBuffer);
if (iDebugFlag) printf ("%sn", szOut);
if (send (sockAccepted, szOut , strlen (szOut), 0) < 0)
perror ("minus1: error from send");
}
}
|
Figure 4 – corrected code fragment |