December 31, 2012 Derek Jones 1 comment

Every year the Christmas cards I receive are a reminder of how seriously a surprisingly large percentage of software developers I know take their beliefs in a Supreme Being.

Surely anybody possessing the skills needed to do well in software engineering would have little trouble uncovering for themselves the significant inconsistencies in any system of beliefs intended to support the existence of a Supreme Being? The evidence from my Christmas card collection shows that some very bright would disagree with me.

This Christmas I have finally been able to come up with some plausible sounding hand waving that leaves me feeling as if I at least have a handle on this previously incomprehensible (to me) behavior.

The insight is to stop considering the Supreme Being question as a problem in logic (i.e., is there a model consistent with belief in a Supreme Being that is also consistent with the known laws of Physics) and start thinking of it was a problem in explaining structure. Love of structure is a key requirement for anybody wanting to get seriously involved with software development (a basic ability to ‘do logic’ is also required, but logic is just another tool and outside of introductory courses and TV shows is vastly overrated).

On the handful of occasions I have spoken to developers about religion (in general I try to avoid this subject, it is just too contentious) things have always boiled down to one of having core belief, a feeling that random is just not a good enough explanation for things being the way they are, while the existence of a Supreme Being slots rather well into their world view.

The human agency detection system has been proposed as one of the reasons for religion; see Scott Atran’s book “In Gods We Trust: The evolutionary landscape of religion” for a fascinating analysis of various cognitive, social and economic factors that create a landscape favoring the existence of some form of religion.

Of course anybody choosing to go with a Supreme Being model has to make significant adjustments to other components of their world view and some of these changes will generate internal inconsistencies. Any developer who has ever been involved in building a large system will have experienced the strange sensation of seeing a system they know to be internally inconsistent function in a fashion that appears perfectly acceptable to everybody involved; listening to users’ views of the system brings more revelations (how could anybody think that was how it worked?) Having had these experiences with insignificantly small systems (compared to the Universe) I can see why some developers might be willing to let slide inconsistencies generated by inserting a Supreme Being into their world view.

I think the reason I don’t have a Supreme Being in my world view is that I am too in love with the experimental method, show me some repeatable experiments and I would be willing to take a Supreme Being more seriously. Perhaps at the end of the day it does all boil down to personal taste.

At the personal level I can see why people are not keen to discuss their telepathic communication sessions (or pray to use one of the nontechnical terms) with their Supreme Being. Having to use a channel having a signal/noise ratio that low must be very frustrating.

Categories: Uncategorized Tags: agency detection, belief, Christmas, cognitive, signal/noise, Supreme Being

Low defect density implies climate code less, not more, reliable

December 24, 2012 Derek Jones 2 comments

I have just been reading a paper comparing the defect density of three climate modelling systems against software from other application domains. The defect density (total reported defects divided by thousands of lines of code) of the climate modelling software was significantly lower than everything else, leading the researchers to conclude that “… suggests that the models are of high software quality,”. I would draw the opposite conclusion, the models have low reliability (I have no idea what software quality is and avoid using the term).

I don’t disagree with Pipitone and Easterbrook numbers, just their conclusion.

There is a very simple technique for creating software that has a low defect density, don’t try too hard to look for defects. There are two reasons why I think this has happened with the climate model software:

Three of the non-climate systems compared against were the Apache HTTP demon, the VTK visulalization toolkit and the Eclipse project. These are all widely used projects with many thousands of users, millions for Apache; this volume of usage corresponds to a huge amount of testing, and it is no wonder that so many faults have been reported. Each climate model tends to be used by one site, a tiny amount of testing, and it is not surprising that few faults have been reported.
Climate models have a big intrinsic testing problem; what is the result of a test supposed to be? With applications such as word processors, browsers, compilers, operating systems, etc the expected behavior is known in many cases so it is possible to write test cases that check for the expected behavior. How does anybody know what the expected behavior of a climate model is? If all the climate models did was to solve the Navier-Stokes equation on a rotating sphere there would be no need for multiple models and the UK Meteorological Office’s Unified model would not have grown from 100 KLOC to 800+ KLOC over the last 15 years.

The one system having a similar defect density to the climate models that Pipitone and Easterbrook compare against is an air traffic control system developed using formal methods, exactly the kind of (expensive and time-consuming) development process that one would expect to have a low defect density.

Software is remarkably fault-tolerant and so, yes, serious fault could exist in the climate models and they would still give answers that looked about right. Based on his experience working on a meteorological model Les Hatton tells the story of a fault so serious that the answers should be completely wrong, but they were not.

If somebody wants to convince me that the software in any of these climate models really is reliable then I want to know about the test suites used to check the behavior; what coverage of the source does the suite have (a high MC/DC would be very good, but I would settle for a very high statement coverage) and how were the expected behaviors calculated.

Categories: Uncategorized Tags: climate model, coverage, defect density, faults, MC/DC, testing

My R naming nemesis

December 17, 2012 Derek Jones 5 comments

When learning a new language I try to make an effort to write it like a native developer. R has one language feature that has been severely testing my desire to write like a native and this afternoon I realized that most of the people reading my code will also experience the same jarring sensation on encountering this construct, so I am not going to use it any more.

What is this language feature that induces a Stroop effect in my mind? It is the use of the period character as part of an identifier’s name (e.g., foo.bar). In almost all of the hundreds of thousands of lines of code I have read over the years this character is used as an operator, it selects a member/field of a struct/record. I’m sure that if I tried long enough and hard enough I could get used to using this character being part of an identifier; after a year or so writing Cobol I got used to the arithmetic minus character being permitted within identifiers (e.g., foo-bar), but that was 20 years ago and my neurons will probably take much longer to adapt this time around.

Most of the R I am writing will be distributed with my book Empirical software engineering with R and I think readers will experience the same jarring sensation I do (apart from those who have not yet been exposed to large amounts of non-R code). I have convinced myself that this is a good enough reason to give up trying to figure out how to use . in identifier name (I have been concocting all sorts of rules involving . being used to separate the primary part of the name and _ the secondary parts, e.g., total.red_light [yes, I should get out more often]; the underscore vs. camel case debate still erupts every now and again, let’s avoid creating more debate by introducing more choice).

Those R functions that include a . in their name will stand out from the crowd, [arm waving on] perhaps this will help differentiate them as ‘statistics stuff'[arm waving off]. There is always plan B if my unilateral naming decision looks too unilateral, a global renaming script.

Perhaps the use of periods in identifiers can be used as a test for being a native R developer. A simple timing test involving a sequence of characters appears on a screen with the developer having to respond as quickly as possible on the number of identifiers being displayed; I’m sure I would be much slower to give a ‘1’ response to total.count than to total_count, displaying total count and total.count on twp separate lines and asking me to quickly specify which line contained the most identifiers would turn me into a nervous wreck. Responses from a dozen or so different sequences ought to be enough be able to distinguish Jonny foreigner from the natives.

I don’t have a problem with $, which R uses as the column/list item selection operator, a character permitted by some compilers for commonly used languages as part of an identifier. This is because I have not read lots of code containing this identifier naming usage.

For my previous book I did a survey of the linguistic and cognitive psychology issues involved in identifier naming. This did a good job of debunking existing ideas about what constitutes good naming practices, but did not come up with any concrete recommendations to replace them (nature abhors a vacuum and the existing pop psychology naming ideas remained).

These days people write PhDs on identifier naming issues (method names, (not yet completed) correlation with quality and code comprehension to name a few); there is even a subfield within this field, how best to split an identifier into its component parts (e.g., refPtr is probably an abbreviation of reference pointer).

Categories: Uncategorized Tags: camelcase, cognitive psychology, identifier, R, Stroop effect

The Shape of Code

Archive

Telepathic communication with a Supreme Being

Low defect density implies climate code less, not more, reliable

My R naming nemesis

Recent Posts

Recent Comments

Archives

Meta