Where are the dead bodies?
The possibility of faults in software causing death or serious injury is often talked about and in some cases large amounts of money are invested in work to reduce the possibility of these events occurring (or at least doing things that will support the view that a company took reasonable precautions, should a case end up in court). The Therac-25 accidents are an often quoted example of a software fault that directly resulted in deaths. These accidents occurred over a 19 month period in the mid 1980s and are believed to have resulted in the death of six people. I don’t wish to disrespect the memory of the people who died, but six people 20 years ago; is that it? Less than the number of people killed every day (around 10) in traffic accidents in the UK.
If faults in software really do have a non-trivial impact on human safety then we would expect this fact to be reflected in accident statistics. After searching the accident statistics for the UK I cannot find any whose cause is directly attributed to software. If there are people who have died as a direct result of faults in software, the death rate has not yet reached the minimum level needed to be recorded as such (or are these deaths ‘hidden’ away in ones and twos within other causes?)
The US National Transportation Safety Board carries out a thorough investigation of all US aviation accidents. Searching the Aviation Accident Database on the query “software” between the dates 1 Jan 2000 and 9 Aug 2005 returns 44 matches. Reading these 44 reports I did not find any accident attributed to a software related issue.
If faults in software are not killing or seriously injuring many people why is so much effort invested in reducing the probability of these events occurring? The following are some of the possibilities:
- The investment actually made is small, but it is talked up.
- The investment is made for economic reasons (e.g., more reliable products are likely to reduce support costs) and increased ’safety’ is a side effect.
- In situations where there is a likelihood of death or serious injury the procedures and reliability of non-software items is sufficient to short-circuit the effects of any life threatening faults that may exist in the software used (at least until the fault can be corrected).
As any developer knows, replicating faulty behavior in software can be very difficult, if not impossible. It may be that software faults are not given as the root cause of death or serious injury because the necessary proof is not available. Or perhaps software faults have yet to be the root cause of such events on any non-trivial scale.
Existing practice affects what people are willing to put up with. Many users of Microsoft Windows now accept that it is necessary to reboot the computer they are using on a daily, or even hourly, basis. Users of cars accept that the tool they are using can result in serious injuries or even death (usually rating nothing more than a story in the local town newspaper). Will there be a public hue and cry once software faults start to be recorded as a primary factor in accidental death or serious injury? As this paper shows, it can take a lot of dead bodies before existing practices are changed.
The lack of dead bodies attributed to a software root cause suggests that it is very still early days for the field of high integrity software development.
This material was originally written in 2005 and appeared in an earlier blog of mine which I did not keep up.