Software debugging is an increasingly large part of the development cycle that hasn’t received its fair share of attention. Various debugging techniques can help developers quickly find and eradicate bugs, including programmatic techniques, special case diagnostics analysis tools, and reversible debuggers.
According to recent Cambridge University research, the global cost of debugging software has risen to $312 billion annually – and it makes up half the development time of the average project (Figure 1). These figures ignore the stress, late nights, and lost weekends due to panic debugging sessions that developers face when trying to get products out. As we live in a world that is increasingly run by software, the $312 billion cost is bound to grow in coming years. Eradicating application software bugs has therefore never been more vital. Glitches that might have slowed down a project or delayed product availability were serious enough, but with embedded software controlling more and more hardware, from cars to cash machines, the consequences are now potentially even more serious.
Yet, there's still a real lack of interest in software debugging, and in using tools and approaches that would make developers' lives easier. Too many still rely on the trusty printf command, combined with instinct and trial and error. Why is this? I think there are two reasons. First, debugging applications is seen as part of the job. Second, it isn't glamorous, so developers prefer to focus their thoughts on writing new code. Given software debugging can take up to 50 percent of development time, delays project completion, and causes major headaches, I'd argue it is a problem (albeit an unglamorous one) worth reducing.
No wonder that Henry Lieberman of MIT called debugging the "the dirty little secret of computer science," stating that "What borders on scandal is the fact that the computer science community as a whole has largely ignored the debugging problem. This is inexcusable, considering the vast economic cost of debugging." (Communications of the ACM, April 1997). That quote is from 1997, and alarmingly little has changed since.
A worsening issue
There are three reasons that software debugging is moving from an inconvenience to a major business problem. The first is the changing place of software in the world around us. Everything now runs on software, from our smart fridges and Internet-connected TVs to our cars and medical systems. Bugs that might cause a PC to crash are trivial compared to real, life or death situations. Companies are also becoming more and more aware of the reputational damage of systems going down – even for a short time.
Linked to this is the second reason: security. What were previously theoretical concerns are now very real, and apt to bite if quality standards are not met. Look at the impact of security breaches on the likes of Target and the financial and reputational damage they do is clear.
On the technical side more and more modern applications are now multithreaded. And with SMP/multicore CPUs becoming mainstream and clock frequencies flat-lining, if we want to exploit the power in future generations of processors it's likely our programs will need to become more multithreaded still. This makes debugging harder because multiple threads mean the developer needs to think about more things at once, and the ordering of concurrent operations is not guaranteed, and often surprising to the developer.
The options for debugging
Software debugging techniques can be classified into three groups: programmatic techniques, special-case diagnosis analysis tools, and general-purpose debuggers (including reversible/bidirectional debuggers).
Put simply, this means that developers modify or write their program in a way that helps them find bugs. Strategies include:
Print statements. This classic is still the most widely used technique. In some senses, its wide use reflects its simplicity and convenience (it's rare that it isn't an option). But in other ways, the fact that print statements remain our number one way of debugging code is a reflection on the inadequacy of mainstream debugging tools.
Assertions. An invaluable tool; most good programmers use assertions liberally.
Language support. If you're lucky enough to be using a "safe language" (such as Java or Python) then there are many classes of bugs that just can't happen (such as memory corruption bugs). Sadly, there are still many jobs where these languages just aren't an option, and one needs to use lower level languages such as C/C++ or even assembler. And there are plenty of bugs that are just as likely in higher level languages (such as race conditions).
Test suites. There is no excuse for not having a test suite for programming projects of even modest size. Not only do they help identify new bugs quickly, but when used properly they can be an excellent way of preventing regressions.
Special-case diagnosis/analysis tools
Some of the most interesting developments in debugging in recent years have come in the forms of tools that help programmers find particular classes of errors in their programs. These automated tools detect the most common bugs (common memory access violations, touching unallocated memory, or potential deadlock conditions, for example).
There are also code coverage tools such as gcov, which are (in my experience) woefully rarely used. Most programmers are very surprised to see just how little of their code is covered by their test suite (if they even have one!), yet code coverage analysis seems to remain the preserve of the few uber-vigilant folks.
Finally, there is hardware protection. Most modern computer systems have memory management hardware to catch illegal memory access when they happen. Many CPUs have other debugging features, such as debug registers that can generate a debug exception whenever a specific virtual address is accessed. Many modern debuggers expose these registers through watchpoints (also known as "data breakpoints") that can cause the program to be stopped when a given address is read or written. These can be particularly useful when faced with obscure memory corruption bugs.
General-purpose and reversible debuggers
While special-case tools can be very useful for debugging they are only useful if the bug is of a certain, well-known kind. If the bug doesn't fit neatly into one of these categories, such tools don't offer any help. And even instances of very common bugs can elude these tools. The recent Heartbleed bug was a very common issue (buffer over-read), yet every commonly used detection tool failed to spot it.
That's one of the reasons why general-purpose debuggers tend to be used more frequently than special case tools. However many programmers use software debuggers only to examine a program's state and then reason backward from there. This uncovers a major flaw – most of a debugger's intelligence is geared toward letting the programmer single-step forward, but that's of limited help. To be really useful, a debugger needs to help the programmer think backward, and walk through the program's execution in reverse.
In their book, The Practice of Programming, Brian Kernighan and Rob Pike give the following advice to programmers when debugging:
"Reason back from the state of the crashed program to determine what could have caused this. Debugging involves backwards reasoning, like solving murder mysteries. Something impossible occurred, and the only solid information is that it really did occur. So we must think backwards from the result to discover the reasons."
So what developers need are reversible (sometimes called bidirectional) debuggers (Figure 2). These let developers record everything that the program being debugged does – every memory access, every computation, and every call to the operating system. This colossal amount of data is then presented to you via a powerful metaphor: the ability to travel backward in time (and forward again) and inspect the program state. Essentially they enable developers to solve the murder mystery by letting them rewind and walk backward as well as forward through the program.
An example use-case: Imagine tracking down some corrupted memory (always a thankless task). With a reversible debugger a developer can simply put a watchpoint on the variable that contains bad data, and run backward to go straight to the line of code that most recently modified it. This really can mean bugs that would take a very long time to track down can be nailed in minutes.
Reversible debuggers are now available for multiple languages. Linux and Android programmers have the choice of the open source GNU Debugger (GDB), as well as commercial alternatives (such as Undo Software's UndoDB), that build on GDB's capabilities to deliver a faster, more comprehensive approach with improved performance.
The good news is that reversible debuggers fit seamlessly into the traditional software debugger model we're used to. While reversible debuggers are bleeding edge today, in a few years any debugger that lacks this ability will be irrelevant. Automatic checkers are also so useful that it's hard to imagine they won't be the norm in a few years.
The increasing importance of debugging
Finding and eradicating bugs has never been more important – meaning that application software debugging tools are no longer a "nice to have," but are a business and development necessity. Whatever embedded market you are in, take a look at the alternatives and pick the best solution for your needs – after all, who wants to spend half their working life debugging code?