Archive

Posts Tagged ‘comments’

Does the Climategate code produce reliable output?

November 30th, 2009 Derek-Jones 2 comments

The source of several rather important commercial programs have been made public recently, or to be more exact programs whose output is important (i.e., the Sequoia voting system and code and data from the Climate Research Unit at University of East Anglia the so called ‘Climategate’ leak). While many technical commentators have expressed amazement at how amateurish the programming appears to be, apparently written with little knowledge of good software engineering practices or knowledge of the programming language being used, those who work on commercial projects know that low levels of software engineering/programming competence is the norm.

The emails included in the Climategate leak provide another vivid example, if one were needed, of why scientific data should be made publicly available; scientists are human and are sometimes willing to hide data that does not fit their pet theory or even fails to validate their theory at all.

The Climategate source has only only recently become available and existing technical commentary has been derived from embarassing comments and the usual complaint about not using the right programming language (Fortran is actually a good choice of language for this problem, it is widely used by climatology researchers and a non-professional programmer is probably makes best of their time by using the one language they know tolerably well rather than attempting to use a new language that nobody else in the research group knows).

An important quality indicator of the leaked software was what was not there, test cases (at least I could not find any). How do we know that a program’s output is correct? One way to gain some confidence in a program’s correctness is to process data for which the correct output is known. This blindness to the importance of program level correctness testing is something that I often encounter in people who are subject area experts rather than professional programmers; they believe that if the output has the form they are expecting it must be correct and will sometimes add ‘faults’ to ‘fix’ output that deviates from what they are expecting.

A quick visual scan through the source showed a tale of two worlds, one of single letter identifier names and liberal use of goto, and the other of what looks like meaningful names, structured code and a non-trivial number of comments. The individuals who have contributed to the code base obviously have very different levels of coding ability. Not having written any Fortran in anger for over 15 years my ability to estimate the impact of more subtle coding practices has atrophied.

What kind of faults might a code review look for in these programs? Common coding errors such as using uninitialized variables and incorrect argument passing are obvious choices and their are tools available to check for these kinds of error. A much more insidious kind of error, which requires people with the mathematical expertise to spot, is created by the approximate nature of floating-point arithmetic.

The source is not huge, but not small either, consisting of around 64,000 lines of Fortran and 16,000 lines of IDL (a language designed for interactive data analysis which to my untrained eye looks a lot like MATLAB). There was no obvious support for building the source included within the leaked files (e.g., no makefiles) and my attempt to manually compile using the GNU Fortran compiler failed miserably. So I cannot say anything reliable about the compiler output warnings.

To me the complete lack of test cases implies that the Climategate code does not produce reliable output. Comments in the code such as ***** APPLIES A VERY ARTIFICIAL CORRECTION FOR DECLINE********* suggests that the authors were willing to patch the code to produce output that matched their expectations; this is the mentality of somebody for whom code correctness is not an important issue and if they don’t believe their code is correct then I don’t either.

Source code in itself is rarely that important, although it might have been expensive to create. The real important information in the leaked files is the climate data. Now that this is available others can apply their analysis skills to provide an interpretation to what, if anything statistically reliable, it is telling us.

Using third party measurement data

February 17th, 2009 Derek-Jones No comments

Until today, to the best of my knowledge, all of the source code analysis papers I have read were written by researchers who had control of the code analysis tools they used and had some form of localised access to the source. By control of the code analysis tools I mean that the researchers specified the tool options and had the ability to check the behavior of the tool, in many cases the source of the tool was available to them and often even written by them, and the localised access may have involved downloading lots of code from the web.

I have just been reading about a broad brush analysis of comment usage based on data provided by a commercial code repository that offers API access to some basic code metrics.

At first I was very frustrated by the lack of depth to the analysis provided in the paper, but then I realised that the authors’ intent was to investigate a few broad ideas about comment usage in a large number of projects (around 10,000). The authors complained in their blog about some of the referees comments and having to submit a shorter paper. I can see where the referees are coming from, the papers are lacking in depth of analysis, but they do contain some interesting results.

I was very interested in Figure 2:
Comment density as a function of source code lines in a given commit
which plots the comment density of the lines in a source code commit. I would expect the ratio to be higher for small commits because a developer probably has a relatively fixed amount to say about updates involving a smallish number of lines (which probably fixes a problem). Larger commits are probably updated functionality and so would have a comment density similar to the ‘average’.

The problem with relying on third parties to supply the data is that obtaining the answers to follow up questions invariably involves lots of work, e.g., creating an environment to perform the measurements needed for the follow up questions. However the third party approach can significantly reduce the amount of work needed to get to a point where the interestingness of the results can be gauged.

Unexpected experimental effects

January 16th, 2009 Derek-Jones No comments

The only way to find out the factors that effect developers’ source code performance is to carry out experiments where they are the subjects.  Developer performance on even simple programming tasks can be effected by a large number of different factors.  People are always surprised at the very small number of basic operations I ask developers to perform in the experiments I run.  My reply is that only by minimizing the number of factors that might effect performance can I have any degree of certainty that the results for the factors I am interested in are reliable.

Even with what appear to be trivial tasks I am constantly surprised by the factors that need to be controlled.  A good example is one of the first experiments I ever ran.  I thought it would be a good idea to replicate, using a software development context, a widely studied and reliably replicated human psychological effect; when asked to learn and later recall/recognize a list of words people make mistakes.  Psychologists study this problem because it provides a window into the operation structure of the human memory subsystem over short periods of time (of the order of at most tens of seconds).  I wanted to find out what sort of mistakes developers would make when asked to remember information about a sequence of simple assignment statements (e.g., qbt = 6;).

I carefully read the appropriate experimental papers and had created lists of variables that controlled for every significant factor (e.g., number of syllables, frequency of occurrence of the words in current English usage {performance is better for very common words}) and the list of assignment statements was sufficiently long that it would just overload the capacity of short term memory (about 2 seconds worth of sound).

The results contained none of the expected performance effects, so I ran the experiment again looking for different effects; nothing.  A chance comment by one of the subjects after taking part in the experiment offered one reason why the expected performance effects had not been seen.  By their nature developers are problem solvers and I had set them a problem that asked them to remember information involving a list of assignment statements that appeared to be beyond their short term memory capacity.  Problem solvers naturally look for patterns and common cases and the variables in each of my carefully created list of assignment statements could all be distinguished by their first letter.  Subjects did not need to remember the complete variable name, they just needed to remember the first letter (something I had not controlled for).  Asking around I found that several other subjects had spotted and used the same strategy.  My simple experiment was not simple enough!

I was recently reading about an experiment that investigated the factors that motivate developers to comment code.  Subjects were given some code and asked to add additional functionality to it. Some subjects were given code containing lots of comments while others were given code containing few comments.  The hypothesis was that developers were more likely to create comments in code that already contained lots of comments, and the results seemed to bear this out.  However, closer examination of the answers showed that most subjects had cut and pasted chunks (i.e., code and comments) from the code they were given.  So code the percentage of code in the problem answered mimicked that in the original code (in some cases subjects had complicated the situation by refactoring the code).

The sound of code

January 15th, 2009 Derek-Jones No comments

Speech, it is claimed, is the ability that separates humans from all other animals, yet working with code is almost exclusively based on sight. There are instances of ‘accidental’ uses of sound, e.g., listening to disc activity to monitor a programs process or in days of old the chatter of other mechanical parts.

Various projects have attempted to intentionally make use of sound to provide an interface to the software development process, including:

    People like to talk about what they do and perhaps this could be used to overcome developers dislike of writing comments. Unfortunately automated processing of natural language (assuming the speech to text problem is solved) has not reached the stage where it is possible to automatically detect when the topic of conversation has changed or to figure out what piece of code is being discussed. Perhaps the reason why developers find it so hard to write good comments is because it is a skill that requires training and effort, not random thoughts that happen to come to mind.
    Writing code by talking (i.e., voice input of source code) initially sounds attractive. As a form of input speech is faster than typing, however computer processing of speech is still painfully slow. Another problem that needs to be handled is the large number of different ways in which the same thing can and is spoken, e.g., numeric values. As a method of output reading is 70% faster than listening.

Unless developers have to spend lots of time commuting in person, rather than telecommuting, I don’ see a future for speech input of code. Audio program execution monitoring probably has market is specialist niches, no more.

I do see a future for spoken mathematics, which is something that people who are not a mathematicians might want to do. The necessary formating commands are sufficiently obtuse that they require too much effort from the casual user.

The 30% of source that is ignored

January 3rd, 2009 Derek-Jones No comments

Approximately 30% of source code is not checked for correct syntax (developers can make up any rules they like for its internal syntax), semantic accuracy or consistency; people are content to shrug their shoulders at this this state of affairs and are generally willing to let it pass. I am of course talking about comments; the 30% figure comes from my own measurements with other published measurements falling within a similar ballpark.

Part of the problem is that comments often contain lots of natural language (i.e., human not computer language) and this is known to be very difficult to parse and is thought to be unusable without all sorts of semantic knowledge that is not currently available in machine processable form.

People are good at spotting patterns in ambiguous human communication and deducing possible meanings from it, and this has helped to keep comment usage alive, along with the fact that the information they provide is not usually available elsewhere and comments are right there in front of the person reading the code and of course management loves them as a measurable attribute that is cheap to do and not easily checkable (and what difference does it make if they don’t stay in sync with the code).

One study that did attempt to parse English sentences in comments found that 75% of sentence-style comments were in the past tense, with 55% being some kind of operational description (e.g., “This routine reads the data.”) and 44% having the style of a definition (e.g., “General matrix”).

There is a growing collection of tools for processing natural language (well at least for English). However, given the traditionally poor punctuation used in comments, the use of variable names and very domain specific terminology, full blown English parsing is likely to be very difficult. Some recent research has found that useful information can be extracted using something only a little more linguistically sophisticated than word sense disambiguation.

The designers of the iComment system sensibly limited the analysis domain (to memory/file lock related activities), simplified the parsing requirements (to looking for limited forms of requirements wording) and kept developers in the loop for some of the processing (e.g., listing lock related function names). The aim was to find inconsistencies between the requirements expressed in comments and what the code actually did. Within the Linux/Mozilla/Wine/Apache sources they found 33 faults in the code and 27 in the comments, claiming a 38.8% false positive rate.

If these impressive figures can be replicated for other kinds of coding constructs then comment contents will start to leave the dark ages.

FireStats icon Powered by FireStatsonline levitra us

buy now viagra

cialis transdermal

order cheap levitra

levitra sex pill

buy propecia in the uk

cost of cialis

levitra prescription

canada generic propecia

cheap propecia order online

generic viagra canada

one day delivery cialis

levitra vs cialis

cheap viagra from uk

best online generic levitra

mail order levitra

cialis cheap

generic levitra cheap

cheap viagra no prescription

canadian viagra

best price on propecia

cialis fast

buy viagra mexico

cialis 5 mg italia

buy cialis 5 mg

generic levitra vardenafil

generic viagra propecia

buying cialis in canada

cheap levitra

levitra from canada

generic viagra in canada

gele viagra

cheapest propecia prescription

order usa viagra online

generic levitra cialis

canadian levitra

canadian generic cialis

cialis soft pills

buy viagra on line

cialis daily dosage pharmacy

discount propecia online

cialis headaches

levitra info

generic propecia finasteride

buying viagra in the us

buy levitra american pharmacy

levitra viagra cialis

cheap viagra on line

female viagra pills

cialis buy overnight

buy propecia without prescription

buying cialis in canada

best price levitra online

best price levitra

cialis cost

lowest cost levitra

buy levitra now

generic propecia fda approved

cialis 100 mg generic

buy propecia in canada

canada levitra

5mg propecia

cialis daily in canada

one day cialis

buy levitra online

lowest propecia price

cheap order prescription propecia

buy levitra online no prescription

buying cialis

order cheap propecia

how to get cialis in canada

cheap online levitra

cheapest propecia online

cialis 5 mg buy

buy levitra uk

canadian healthcare pharmacy

best price generic propecia

buy cialis usa

canadian healthcare

buying viagra with no prescription

cialis daily canada

discount brand name cialis

lowest cost propecia uk

cheapest cialis

cialis for woman

buying generic cialis mexico rx

mail order propecia

canadian pharmacy

get cialis online

cialis and women

canada online pharmacy propecia

cheap propecia online prescription

5 mg cialis

canada online pharmacy propecia

cialis canada online pharmacy

buy viagra china

levitra buying

cheap viagra canada or india

cialis 50 mg dose

cialis generic

buy levitra in europe

buy viagra online canada

cialis arterial fibrillation

cheap levitra online us

buy propecia online usa

levitra price

cialis cost

cialis buy overnight

discount levitra online viagra

buy generic cialis

cheap cialis from india

mexico levitra

fast propecia

beta blockers and viagra

cialis 30 mg

ordering propecia online

buy propecia online cheap pharmacy

levitra buy online

levitra in mexico

cialis discount

cialis for sale

cialis daily price

dosage levitra

cialis cheap us pharmacy

buy cheapest propecia

cialis kanada

online order propecia

best way to take cialis

levitra tabs

generic viagra online

discount real viagra

buy cialis cannada

best cialis price

buy viagra without prescription

generic levitra canada

generic propecia alternative

online ordering propecia

cheapest cialis

canadian healthcare viagra sales

buy cheap propecia online

cialis canada

cialis for sale

cialis on line pricing in canada

overnight canadian viagra

buy levitra vardenafil

cheapest viagra

buy propecia now

buy real viagra online

levitra in india

cialis strenght mg

cialis 50 mg

cialis in mexico

canadian viagra india

brand viagra over the net

cialis and canada custom

cialis dosage

buy propecia and proscar

buy propecia cheap

best levitra price

cheap viagra from uk

levitra in canada

levitra tablets

cheap viagra pills

discount drug propecia

lowest price propecia costs us

canadian pharmacies cialis

buy propecia no prescription

canada meds viagra

cialis india pharmacy

buy viagra online cheap us

canadian drugs propecia

how does viagra work

buying real viagra without prescription

bestellen levitra

cialis samples

canadian pharmacy

canadian cialis

cialis no rx

buy generic propecia online

no prescription viagra

bruising on cialis

brand viagra without prescription buy

best price for propecia

buy discount viagra

cialis daily canada

generic cialis

generic viagra from china

cialis alternitives

cialis 100 mg

buy can from i propecia who

cheap discount levitra

cialis brand only

cialis professional 100 mg

bruising on cialis

mexico propecia

canadian generic viagra online

cialis and women

cialis and ketoconazole

canadain viagra india

canadian propecia cheap

buy prescription propecia

buy propecia international pharmacy

buy cheap generic propecia

next day delivery cialis

buy cheap levitra online

canada propecia prescription

bestellen levitra online

buy levitra by mail

levitra presciptions online

cheapest propecia online

next day viagra

obtain viagra without prescription

levitra online sales

cialis canada illegal buy

cheap online levitra

baldness male propecia

buying viagra

buy cialis next day delivery

levitra cheapest

buy drug propecia

cialis 20 mg

buy levitra online us

buy cheap generic levitra

levitra 10 mg

best propecia prices

cialis and diarrhea

best cialis price

cheap viagra generic

best way to use cialis

generic cialis soft tabs

levitra mail order

cialis 5 mg italia

buy fast propecia

canadian pharmacy viagra legal

cialis iop

cialis blood thinner

cialisis in canada

cialis dose

canadian healthcare viagra

cialis germany

low price cialis

buy now online propecia

buy levitra by mail

buy viagra pills

buy levitra lowest prices

cialis tablets foreign

levitra without prescription

cialis kanada

cheapest viagra online

buy levitra in europe