Thank goodness for regression tests. There I was, earlier this week, feeling really good about some new code that I had written and (I thought) debugged. I’m in the process of enhancing the OpenVOS POSIX functions that convert binary time values to the broken-down time structure (“struct tm” for you C experts). Back in 1998, when I originally modified these routines to work in the POSIX environment, I took a shortcut and called the underlying VOS kernel subroutines. This approach was fast and simple but it meant that our POSIX runtimes could not handle dates between 1970 and 1980, because VOS doesn’t handle those dates. Lately, I’ve been porting several major new, open-source packages to OpenVOS 17.0. I discovered that we were failing a bunch of tests because we could not handle this range of dates. So I set about the task of modifying the code to handle all dates in both the VOS and UNIX Epochs (1970 to 2048). I knew this task would be error-prone, and I was determined to find a way to grind all of the bugs out of the code. One of the nice things about working with dates is that the set is finite. Plus, since modern computers are quite fast, it isn’t all that hard to just have them crunch through all possible combinations and see what happens. I wanted to be sure to test the area of the code that handled 1970 and 1971, for reasons I won’t go into here. So I wrote a test that converted 2 times per day, starting January 1, 1970 and extending until the present. It used only a fraction of a second of CPU time. Sure enough, I found a fencepost error in my code. I corrected the error and then the test passed. I was feeling pretty good about this process, and my auditors signed off on the changes, and I thought I was done. Then, a colleague of mine ran the regression test suite over my changes. We do this after every change to the compilers and their runtimes, just to be sure we don’t accidentally break anything. Unfortunately for me, several of the regression tests failed. When I found out why my testing hadn’t caught it, despite the apparent thoroughness of the test, I discovered that the problems didn’t arise until 2038. While I had added the decade from 1970 to 1980 to the time routines, I had lopped off the decade from 2038 to 2048. Sure enough, when I went back and tested every day in the range, I proved that my test would have caught the problem, if only I’d asked it to.
What’s the lesson here? I suppose the most obvious one is that if you can test all combinations, then be sure you actually do test all combinations. Perhaps a more important lesson is to take the time to build up a regression test suite. They are worth the time and effort. They can cut the cost and shorten the time of finding and fixing problems as you enhance your code. You don’t have to avoid very many customer-reported problems to justify the cost of creating a regression test suite. I estimate that the cost of fixing a software problem goes up by an order of magnitude with each additional phase in the process. Since there are at least 4 phases in any process (design, code, test, deploy), things get expensive pretty quickly.