Archive for October, 2010

Language vulnerabilities TR published

October 28th, 2010 No comments

ISO has just published the first Technical Report from the Programming Language Vulnerabilities working group (ISO/IEC TR 24772 Information technology — Programming languages — Guidance to avoiding vulnerabilities in programming languages through language selection and use). Having worked on programming language coding guidelines for many years I was an eager participant during the first few years of this group. Unfortunately this committee succumbed to an affliction that effects many groups working in this area, a willingness to uncritically include any reasonable sounding proposal that came its way.

Coding guidelines are primarily driven by human factors with implementation issues coming to the fore in some situations and guideline selection involves a cost/benefit trade-off. Early versions of the TR addressed these topics directly. As work progressed many members of WG23 (our official designation is actually ISO/IEC JTC 1/SC 22/WG 23) seemed to want a concrete approach, i.e., an unstructured list of guidelines, rather than a short list of potential problem areas derived by careful analysis.

A major input came from Larry Wagoner’s analysis of fault reports from various sources; he produced a list of the most common problems, and it was generally agreed that these be used as the base from which to derive the general guidelines.

Work began on writing the text to cover the faults listed by Larry, coalescing similar constructs where appropriate. Many of the faults occurred in programs written in C and some attempt was made to generalize the wording to be applicable other languages (the core guidelines were intended to be language independent).

Over time suggestions were sent to the group’s convener and secretary for consideration (these came from group members and interested outsiders). Unfortunately many of these proposals were accepted for inclusion without analysing the likelihood of them being a common source of faults in practice or the possibility that use of alternative constructs would be more fault prone.

Some members of the group were loath to remove text from the draft document if it was decided that the issue discussed was not applicable. This text tended to migrate to a newly created Section 7 “Applications Issues”, a topic outside of our scope of work).

The resulting document is a mishmash of guidelines likely to be relevant to real code and others that are little more than platitudes (e.g., ‘Demarcation of Control Flow’ and ‘Structured Programming’) or examples of very specific problems that occasional occur (i.e., not common enough to be worth mentioning such as ‘Enumeration issues’).

I eventually threw up my hands in despair at trying to stem the flow of ‘good’ ideas that kept being added and stopped being actively involved.

Would I recommend TR 24772 to people? It does contain some very useful material. The material I think should have been culled is generally not bad, it just has a very low value and for most developers is probably not worth spending time on.

Footnote. ISO makes its money from selling standards and does not like to give them away for free. Those of us who work on standards want them to be used and would like them to be freely available. There is a precedent for TRs being made freely available and hopefully this will eventually happen to TR 24772. In the meantime there are documents on the WG23 web site that essentially contain all of the actual technical text.


Predictor vs. control metrics

October 3rd, 2010 1 comment

For some time I have been looking for a good example to illustrate the difference between predicting some software attribute and controlling that attribute. A recent study comparing two methods of predicting a person’s height looks like it might be what I am after.

The study compared the accuracy of using information derived from a person’s genes to information on their parents’ height.

If a person is not malnourished or seriously ill while growing, then their genes control how tall they will finish up as an adult. Tinker with the appropriate genes while a person is growing and their final height will be different; tinker after they have reached adulthood and height is unlikely to change,

Around 150 years ago Francis Galton developed a statistical technique that enabled a person’s final adult height to be estimated from their parent’s height. Parent height is a predictor, not a controller, of their children’s height.

A few metrics have been found empirically to correlate with faults in code and a larger number, having no empirical evidence of a correlation, have been proposed as fault ‘indicators’. Unfortunately many software developers and most managers are blind to the saying “correlation does not imply causation”.

I’m sure readers would agree that once a baby is born changes to their parents height has no effect on their final adult height. Following this line of argument we would not expect changing source code to reduce the number of faults predicted by some metric to actually reduce the number of faults (ok, fiddling with the source might result in a fault being spotted and fixed).

The interesting part of the study was the finding that the prediction based on existing knowledge about the genetics of height explained 4-6% of subjects’ sex- and age-adjusted height variance, while Galton’s method based on parent height explained 40%. Today, 60 years after the structure of DNA was discovered our knowledge of what actually controls height is nowhere near as encompassing as an attribute that is purely a predictor.

Today, do we have have a good model of what actually happens when code is written? I think not.

How much time and money is being wasted changing software to improve its fit to predictor metrics rather than control metrics (I have little idea what any of these might be)? I dread to think.