别担心,尽管标题这么写,但我并没有重拾数学宅的模式。这次要讲的不是数论,而是套接字编程,以及一个我见过太多次的编码错误。
图 1 中的代码片段展示了这一错误。其中有一个主 while 循环,该循环会无限期地调用 select 函数,等待可接收的字符。一旦 select 函数指示有字符可用,代码就会进入另一个循环,直到接收完 10 个字符为止。在调用 recv 函数后,代码会正确地检查返回值是否为 0,这表示远程对端已关闭套接字。 随后,它检查errno是否为0;若是,则将刚接收到的字符拼接到应用程序缓冲区中,并将刚接收到的字符数加到当前消息的总接收字符数中。最后,它检查errno是否为EWOULDBLOCK;若为其他值,则以错误状态退出程序。此时,内部循环完成,如果消息中的字符数少于10,则再次调用recv。
while (1)
{
FD_ZERO (&fdsetREAD);
FD_ZERO (&fdsetNULL);
FD_SET (sockAccepted, &fdsetREAD);
iNumFDS = sockAccepted + 1;
/* wait for the start of a message to arrive */
iSelected = select (iNumFDS,
&fdsetREAD, &fdsetNULL, &fdsetNULL, &timevalTimeout);
if (iSelected < 0) /* Error from select, report and abort */
{
perror ("minus1: error from select");
exit (errno);
}
/* select indicates something to be read. Since there is only 1 socket
there is no need to figure out which socket is ready. Note that if
select returns 0 it just means that it timed out, we will just go around
the loop again.*/
else if (iSelected > 0)
{
szAppBuffer [0] = 0x00; /* "zero out" the application buffer */
iTotalCharsRecv = 0; /* zero out the total characters count */
while (iTotalCharsRecv < 10) /* loop until all 10 characters read */
{ /* now read from socket */
iNumCharsRecv = recv (sockAccepted, szRecvBuffer,
10 - iTotalCharsRecv, 0);
if (iDebugFlag) /* debug output show */
{ /* value returned from recv and errno */
printf ("%d %d ", iNumCharsRecv, errno);
if (iNumCharsRecv > 0) /* also received characters if any */
{
szRecvBuffer [iNumCharsRecv] = 0x00;
printf ("[%s]n", szRecvBuffer);
}
else printf ("n");
}
if (iNumCharsRecv == 0) /* If 0 characters received exit app */
{
printf ("minus1: socket closedn");
exit (0);
}
else if (errno == 0) /* if "no error" accumulate received */
{ /* chars into an applictaion buffer */
szRecvBuffer [iNumCharsRecv] = 0x00;
strcat (szAppBuffer, szRecvBuffer);
iTotalCharsRecv = iTotalCharsRecv + iNumCharsRecv;
szRecvBuffer [0] = 0x00;
}
else if (errno != EWOULDBLOCK) /* Ignore an EWOULDBLOCK error */
{ /* anything else report and abort */
perror ("minus1: Error from recv");
exit (errno);
}
if (iDebugFlag) sleep (1); /* this prevents the output from */
} /* scrolling off the window */
sprintf (szOut, "Message [%s] processedn", szAppBuffer);
if (iDebugFlag) printf ("%sn", szOut);
if (send (sockAccepted, szOut , strlen (szOut), 0) < 0)
perror ("minus1: error from send");
}
}
|
| 图 1 – 错误代码片段 |
图 2 展示了一个会话示例。发送到服务器的字符以 黄色,而返回的已处理消息则未高亮显示。发送的字符包含一个结束的新行字符,并通过 1 个 TCP 分段发送。当发送的字符数恰好为 10 个时,一切运行正常。但当一个 TCP 分段中仅发送 6 个字符时,服务器便停止响应。
123456789 消息 [123456789] 已处理 abcdefghi 消息 [abcdefghi\ ] 已处理 12345abcd 消息 [12345abcd\ ] 已处理 12345 789 abcdefghi 123456789 |
| 图 2 – 客户端会话 |
Figure 3 shows the server session with debug turned on. You can see that after the “12345<new line>” characters are received the next recv returns -1 and sets the errno to 5011, which is EWOULDBLOCK. The code then loops and the next recv returns the characters “789<new line>” but the errno value is still set to 5011. In fact every recv after that regardless of whether there are characters received or not has errno set to 5011.
连接已建立 10 0 [123456789 ] 消息 [123456789 ] 已处理 10 0 [abcdefghi ] 消息 [abcdefghi ] 已处理 10 0 [12345abcd ] 消息 [12345abcd ] 已处理 6 0 [12345 ] -1 5011 4 5011 [789 ] -1 5011 4 5011 [abcd] 4 5011 [efgh] 4 5011 [i 12] 4 5011 [3456] 4 5011 [789 ] -1 5011 -1 5011 -1 5011 |
| 图 3 – 服务器调试输出 |
由于 errno 的值不为 0,接收到的字符未被拼接到应用程序缓冲区中,因此代码会无限循环。
这并非套接字代码中的错误。套接字 API 明确指出,除非函数返回 -1,否则 errno 的值未定义。未定义意味着该值未被设置,因此 errno 仍保留其之前的值。
现在你可能会认为,没有人会把一条10个字符的消息拆分成两部分,你的想法或许没错;但试想一下,如果消息长度不是10个字符,而是100个或1000个字符呢?此外,请记住,TCP传输的是字节流而非消息;TCP协议栈可以随时将一条应用程序消息拆分成多个TCP分段。 某些情况会增加这种可能性,例如应用消息过长、在前一条消息尚未传输完毕时就发送下一条,以及TCP分段丢失等,这些都是最容易想到的原因。在特定条件下,这种服务器代码完全有可能——甚至极有可能——通过所有验收测试,并在生产环境中运行良好,至少在一段时间内如此。
好消息是,这个问题有一个非常简单的解决方法:与其检查 errno 是否等于 0,不如检查返回值是否大于 0,请参见图 4 中标注的修改。另外请注意,“errno != EWOULDBLOCK” 这一检查的注释现在指出,只有当 recv 返回负值时,才会进入该 if 语句。而 recv 返回的唯一负值是 -1。
while (1)
{
FD_ZERO (&fdsetREAD);
FD_ZERO (&fdsetNULL);
FD_SET (sockAccepted, &fdsetREAD);
iNumFDS = sockAccepted + 1;
/* wait for the start of a message to arrive */
iSelected = select (iNumFDS,
&fdsetREAD, &fdsetNULL, &fdsetNULL, &timevalTimeout);
if (iSelected < 0) /* Error from select, report and abort */
{
perror ("minus1: error from select");
exit (errno);
}
/* select indicates something to be read. Since there is only 1 socket
there is no need to figure out which socket is ready. Note that if
select returns 0 it just means that it timed out, we will just go around
the loop again.*/
else if (iSelected > 0)
{
szAppBuffer [0] = 0x00; /* "zero out" the application buffer */
iTotalCharsRecv = 0; /* zero out the total characters count */
while (iTotalCharsRecv < 10) /* loop until all 10 characters read */
{ /* now read from socket */
iNumCharsRecv = recv (sockAccepted, szRecvBuffer,
10 - iTotalCharsRecv, 0);
if (iDebugFlag) /* debug output show */
{ /* value returned from recv and errno */
printf ("%d %d ", iNumCharsRecv, errno);
if (iNumCharsRecv > 0) /* also received characters if any */
{
szRecvBuffer [iNumCharsRecv] = 0x00;
printf ("[%s]n", szRecvBuffer);
}
else printf ("n");
}
if (iNumCharsRecv == 0) /* If 0 characters received exit app */
{
printf ("minus1: socket closedn");
exit (0);
}
else if (iNumCharsRecv > 0) /* if no error accumulate received */
{ /* chars into an applictaion buffer */
szRecvBuffer [iNumCharsRecv] = 0x00;
strcat (szAppBuffer, szRecvBuffer);
iTotalCharsRecv = iTotalCharsRecv + iNumCharsRecv;
szRecvBuffer [0] = 0x00;
}
else if (errno != EWOULDBLOCK) /* if we get here iNumCharsRecv */
{ /* must be -1 so errno is defined */
perror /* Ignore an EWOULDBLOCK error */
("minus1: Error from recv"); /* anything else report */
exit (errno); /* and abort */
}
if (iDebugFlag) sleep (1); /* this prevents the output from */
} /* scrolling off the window */
sprintf (szOut, "Message [%s] processedn", szAppBuffer);
if (iDebugFlag) printf ("%sn", szOut);
if (send (sockAccepted, szOut , strlen (szOut), 0) < 0)
perror ("minus1: error from send");
}
}
|
| 图 4 – 修正后的代码片段 |
