It’s not a secret that the code quality standards for embedded systems software are much higher than those for games, photo editors and others. The requirements have increased after cases when hardware having got out of control caused a lot of tragic situations and, at best, great deal of money was spent for nothing. For example, Arianespace, a French company, has got away with 10 year loss of employees work, seven billion dollars loss and tiresome trial procedure to find out who dropped the ball. The company offers space launch services on a commercial basis. Its Ariane 4 rocket has been successfully launched more than 100 times, however, the next model, Ariane 5, exploded at the very first launch. The rocket was destroyed 40 seconds after takeoff because of a software error. There were several reasons, which led to the result, to be exact:
· The developers reused the software module of Ariane 4, however, the Ariane 5 operating conditions were different from the fourth model.
· The Ariane 5 system detected the error but wasn’t able to handle it correctly.
· The erroneous module of the fifth model wasn’t tested properly.
The error, which caused the accident, became one of the most expensive errors in history. You can learn more about it here - A space error: 370.000.000 $ for an integer overflow.
It could have been worse though.
First, I’d like to note that one-time therapeutic dose of radiation is 200 rads maximum. The lethal dose is 1000 rads. Further we’ll discuss a machine that had been delivering radiation dose of 20,000 rads to people.
Therac-25 is a radiotherapy machine and medical accelerator made by Atomic Energy of Canada Limited, a Canadian state organization. Its series were launched in 1982. After running the machine, from 1985 to 1987, at least six people had been overdosed and at least two had died of radiation. What was wrong with the machine? It had at least four apparent issues, which could lead to that:
· The same variable was used for both analyzing the introduced numbers and setting the turntable position. That’s why if data was entered via console quickly, Therac-25 could face wrong turntable position (race condition).
· It takes 8 seconds to set the position of reflective magnets. If the type and radiation power parameters were changed in that time and the caret was set to the final position, the system couldn’t detect any modifications.
· Division by the radiation rate which in some cases lead to the division by zero error and a corresponding increase in radiation rate to the maximum possible one.
· Boolean (one-byte) variable setting in ‘true’ value was done using the ‘x=x+1’ command. That’s why after clicking the ‘Set’ button the program could miss the information about wrong disk position with probability of 1/256.
You can learn more about the Therac-25 device, the undertaken investigation, the fix list and so on from the article ‘Killer Bug. Therac-25: Quick-and-Dirty’. We’ll go further to another error that led to no less tragic consequences.
The mistake made by Toyota cost it a fortune. National Highway Traffic Safety Administration (NHTSA) has calculated that, 89 people died and 57 were injured in accidents that took place from 2000 to 2010. Toyota conducted its own investigation and concluded that sticking gas pedal and badly fitted floor mats were to blame. However, people wouldn’t stop complaining. NHTSA then took on the investigation along with NASA and two engineers, Michael Barr and Philip Koopman who have done a tremendous amount of work and checked the whole Toyota code manually. The engineers have found 81,514 errors: cyclomatic complexity of a program was more than 50, recursion was used in the Toyota code and every issue caused by it led to processor reset. NASA, in its turn, used MISRA (Motor Industry Software Reliability Association) standards for assessing code quality and have found 7,134 violations.
I’ll further explain what MISRA is, and to those who got interested in Toyota’s code bug which led to many deaths, I recommend reading this article.
MISRA standards are intended to improve security, portability and reliability of embedded system programs. MISRA was originally made for automobile industry. However, nowadays the standards are used in medical devices development, telecommunication, military projects, etc. They are basically a set of rules and recommendations to follow when developing software.
All these rules may be divided into these categories:
Let me give an example of the mandatory ones:
· Don’t use uninitialized variable values;
· Don’t write unreachable code;
· Loop counters must not have floating-point type.
However, MISRA is not a panacea. You shouldn’t consider it as a silver bullet that saves software from bugs. To minimize the risk of having a bug it’s important to use MISRA along with other methods of software analysis and checking, including static code analysis. Static code analyzers are tools for detecting bugs and potential vulnerabilities in source code. Using a static code analyzer will help you find even error patterns unknown to most programmers. It also helps detect the errors, which are not that easy to find during code review. Besides, some static analyzers support MISRA so you can save time on checking if the code meets the standards.
It’s not enough, however, to check a project using a static code analyzer just once, fix the bugs and forget about it all. You should use it wisely. If an analyzer is adopted into a big project, it will probably issue a lot of warnings. But you shouldn’t rush to fix them – hiding analyzer’s messages for a while and focusing on the new ones occurring during further development will be enough. Other warnings may be considered as technical debt that you can fix whenever possible. However, the least painful option of using MISRA along with an analyzer is to start utilizing them at the very beginning of development.
For questions and more information, please contact: firstname.lastname@example.org