Archive

Posts Tagged ‘comments’

Does the Climategate code produce reliable output?

November 30th, 2009 Derek-Jones 2 comments

The source of several rather important commercial programs have been made public recently, or to be more exact programs whose output is important (i.e., the Sequoia voting system and code and data from the Climate Research Unit at University of East Anglia the so called ‘Climategate’ leak). While many technical commentators have expressed amazement at how amateurish the programming appears to be, apparently written with little knowledge of good software engineering practices or knowledge of the programming language being used, those who work on commercial projects know that low levels of software engineering/programming competence is the norm.

The emails included in the Climategate leak provide another vivid example, if one were needed, of why scientific data should be made publicly available; scientists are human and are sometimes willing to hide data that does not fit their pet theory or even fails to validate their theory at all.

The Climategate source has only only recently become available and existing technical commentary has been derived from embarassing comments and the usual complaint about not using the right programming language (Fortran is actually a good choice of language for this problem, it is widely used by climatology researchers and a non-professional programmer is probably makes best of their time by using the one language they know tolerably well rather than attempting to use a new language that nobody else in the research group knows).

An important quality indicator of the leaked software was what was not there, test cases (at least I could not find any). How do we know that a program’s output is correct? One way to gain some confidence in a program’s correctness is to process data for which the correct output is known. This blindness to the importance of program level correctness testing is something that I often encounter in people who are subject area experts rather than professional programmers; they believe that if the output has the form they are expecting it must be correct and will sometimes add ‘faults’ to ‘fix’ output that deviates from what they are expecting.

A quick visual scan through the source showed a tale of two worlds, one of single letter identifier names and liberal use of goto, and the other of what looks like meaningful names, structured code and a non-trivial number of comments. The individuals who have contributed to the code base obviously have very different levels of coding ability. Not having written any Fortran in anger for over 15 years my ability to estimate the impact of more subtle coding practices has atrophied.

What kind of faults might a code review look for in these programs? Common coding errors such as using uninitialized variables and incorrect argument passing are obvious choices and their are tools available to check for these kinds of error. A much more insidious kind of error, which requires people with the mathematical expertise to spot, is created by the approximate nature of floating-point arithmetic.

The source is not huge, but not small either, consisting of around 64,000 lines of Fortran and 16,000 lines of IDL (a language designed for interactive data analysis which to my untrained eye looks a lot like MATLAB). There was no obvious support for building the source included within the leaked files (e.g., no makefiles) and my attempt to manually compile using the GNU Fortran compiler failed miserably. So I cannot say anything reliable about the compiler output warnings.

To me the complete lack of test cases implies that the Climategate code does not produce reliable output. Comments in the code such as ***** APPLIES A VERY ARTIFICIAL CORRECTION FOR DECLINE********* suggests that the authors were willing to patch the code to produce output that matched their expectations; this is the mentality of somebody for whom code correctness is not an important issue and if they don’t believe their code is correct then I don’t either.

Source code in itself is rarely that important, although it might have been expensive to create. The real important information in the leaked files is the climate data. Now that this is available others can apply their analysis skills to provide an interpretation to what, if anything statistically reliable, it is telling us.

Using third party measurement data

February 17th, 2009 Derek-Jones No comments

Until today, to the best of my knowledge, all of the source code analysis papers I have read were written by researchers who had control of the code analysis tools they used and had some form of localised access to the source. By control of the code analysis tools I mean that the researchers specified the tool options and had the ability to check the behavior of the tool, in many cases the source of the tool was available to them and often even written by them, and the localised access may have involved downloading lots of code from the web.

I have just been reading about a broad brush analysis of comment usage based on data provided by a commercial code repository that offers API access to some basic code metrics.

At first I was very frustrated by the lack of depth to the analysis provided in the paper, but then I realised that the authors’ intent was to investigate a few broad ideas about comment usage in a large number of projects (around 10,000). The authors complained in their blog about some of the referees comments and having to submit a shorter paper. I can see where the referees are coming from, the papers are lacking in depth of analysis, but they do contain some interesting results.

I was very interested in Figure 2:
Comment density as a function of source code lines in a given commit
which plots the comment density of the lines in a source code commit. I would expect the ratio to be higher for small commits because a developer probably has a relatively fixed amount to say about updates involving a smallish number of lines (which probably fixes a problem). Larger commits are probably updated functionality and so would have a comment density similar to the ‘average’.

The problem with relying on third parties to supply the data is that obtaining the answers to follow up questions invariably involves lots of work, e.g., creating an environment to perform the measurements needed for the follow up questions. However the third party approach can significantly reduce the amount of work needed to get to a point where the interestingness of the results can be gauged.

Unexpected experimental effects

January 16th, 2009 Derek-Jones No comments

The only way to find out the factors that effect developers’ source code performance is to carry out experiments where they are the subjects.  Developer performance on even simple programming tasks can be effected by a large number of different factors.  People are always surprised at the very small number of basic operations I ask developers to perform in the experiments I run.  My reply is that only by minimizing the number of factors that might effect performance can I have any degree of certainty that the results for the factors I am interested in are reliable.

Even with what appear to be trivial tasks I am constantly surprised by the factors that need to be controlled.  A good example is one of the first experiments I ever ran.  I thought it would be a good idea to replicate, using a software development context, a widely studied and reliably replicated human psychological effect; when asked to learn and later recall/recognize a list of words people make mistakes.  Psychologists study this problem because it provides a window into the operation structure of the human memory subsystem over short periods of time (of the order of at most tens of seconds).  I wanted to find out what sort of mistakes developers would make when asked to remember information about a sequence of simple assignment statements (e.g., qbt = 6;).

I carefully read the appropriate experimental papers and had created lists of variables that controlled for every significant factor (e.g., number of syllables, frequency of occurrence of the words in current English usage {performance is better for very common words}) and the list of assignment statements was sufficiently long that it would just overload the capacity of short term memory (about 2 seconds worth of sound).

The results contained none of the expected performance effects, so I ran the experiment again looking for different effects; nothing.  A chance comment by one of the subjects after taking part in the experiment offered one reason why the expected performance effects had not been seen.  By their nature developers are problem solvers and I had set them a problem that asked them to remember information involving a list of assignment statements that appeared to be beyond their short term memory capacity.  Problem solvers naturally look for patterns and common cases and the variables in each of my carefully created list of assignment statements could all be distinguished by their first letter.  Subjects did not need to remember the complete variable name, they just needed to remember the first letter (something I had not controlled for).  Asking around I found that several other subjects had spotted and used the same strategy.  My simple experiment was not simple enough!

I was recently reading about an experiment that investigated the factors that motivate developers to comment code.  Subjects were given some code and asked to add additional functionality to it. Some subjects were given code containing lots of comments while others were given code containing few comments.  The hypothesis was that developers were more likely to create comments in code that already contained lots of comments, and the results seemed to bear this out.  However, closer examination of the answers showed that most subjects had cut and pasted chunks (i.e., code and comments) from the code they were given.  So code the percentage of code in the problem answered mimicked that in the original code (in some cases subjects had complicated the situation by refactoring the code).

The sound of code

January 15th, 2009 Derek-Jones No comments

Speech, it is claimed, is the ability that separates humans from all other animals, yet working with code is almost exclusively based on sight. There are instances of ‘accidental’ uses of sound, e.g., listening to disc activity to monitor a programs process or in days of old the chatter of other mechanical parts.

Various projects have attempted to intentionally make use of sound to provide an interface to the software development process, including:

    People like to talk about what they do and perhaps this could be used to overcome developers dislike of writing comments. Unfortunately automated processing of natural language (assuming the speech to text problem is solved) has not reached the stage where it is possible to automatically detect when the topic of conversation has changed or to figure out what piece of code is being discussed. Perhaps the reason why developers find it so hard to write good comments is because it is a skill that requires training and effort, not random thoughts that happen to come to mind.
    Writing code by talking (i.e., voice input of source code) initially sounds attractive. As a form of input speech is faster than typing, however computer processing of speech is still painfully slow. Another problem that needs to be handled is the large number of different ways in which the same thing can and is spoken, e.g., numeric values. As a method of output reading is 70% faster than listening.

Unless developers have to spend lots of time commuting in person, rather than telecommuting, I don’ see a future for speech input of code. Audio program execution monitoring probably has market is specialist niches, no more.

I do see a future for spoken mathematics, which is something that people who are not a mathematicians might want to do. The necessary formating commands are sufficiently obtuse that they require too much effort from the casual user.

The 30% of source that is ignored

January 3rd, 2009 Derek-Jones No comments

Approximately 30% of source code is not checked for correct syntax (developers can make up any rules they like for its internal syntax), semantic accuracy or consistency; people are content to shrug their shoulders at this this state of affairs and are generally willing to let it pass. I am of course talking about comments; the 30% figure comes from my own measurements with other published measurements falling within a similar ballpark.

Part of the problem is that comments often contain lots of natural language (i.e., human not computer language) and this is known to be very difficult to parse and is thought to be unusable without all sorts of semantic knowledge that is not currently available in machine processable form.

People are good at spotting patterns in ambiguous human communication and deducing possible meanings from it, and this has helped to keep comment usage alive, along with the fact that the information they provide is not usually available elsewhere and comments are right there in front of the person reading the code and of course management loves them as a measurable attribute that is cheap to do and not easily checkable (and what difference does it make if they don’t stay in sync with the code).

One study that did attempt to parse English sentences in comments found that 75% of sentence-style comments were in the past tense, with 55% being some kind of operational description (e.g., “This routine reads the data.”) and 44% having the style of a definition (e.g., “General matrix”).

There is a growing collection of tools for processing natural language (well at least for English). However, given the traditionally poor punctuation used in comments, the use of variable names and very domain specific terminology, full blown English parsing is likely to be very difficult. Some recent research has found that useful information can be extracted using something only a little more linguistically sophisticated than word sense disambiguation.

The designers of the iComment system sensibly limited the analysis domain (to memory/file lock related activities), simplified the parsing requirements (to looking for limited forms of requirements wording) and kept developers in the loop for some of the processing (e.g., listing lock related function names). The aim was to find inconsistencies between the requirements expressed in comments and what the code actually did. Within the Linux/Mozilla/Wine/Apache sources they found 33 faults in the code and 27 in the comments, claiming a 38.8% false positive rate.

If these impressive figures can be replicated for other kinds of coding constructs then comment contents will start to leave the dark ages.

FireStats icon Powered by FireStatsbuy levitra online from canada

generic cialis

canadian pharmacy discount code viagra

canadian generic cialis

bruising on cialis

cialis woman

buy cialis once daily

cialis 30 mg

levitra 20mg

levitra cheap fast

get propecia cheap

can i get viagra in mexico

cialisis in canada

buy pfizer viagra online

levitra sex pill

cialis canadian cost

buy propecia online pharmacy

buying cialis without a prescription

buy propecia international pharmacy

buy propecia online usa

online viagra gel to buy

buy levitra in europe

brand viagra without prescription buy

cialis for sale

canadian generic viagra online

canadian generic cialis

generic propecia online pharmacy

no prescription viagra

buy cheap levitra

generic cialis canadian

canadian viagra

cialis daily in canada

buying generic propecia

levitra cheap canadian pharmacy

best deal for propecia

overnight delivery cialis

cialis profesional

cialis on line pricing in canada

buy viagra germany canadian meds

brand cialis

canada meds viagra

discount cialis

brand name cialis overnight

canadain viagra india

cialis headaches

buy daily cialis

discount canadian cialis

levitra prescription

buy levitra vardenafil

cost of cialis

best price for propecia

cialis on line

mexico levitra

canadian viagra and healthcare

levitra without prescription

brand viagra over the net

canadian healthcare viagra

cialis by mail

chip cialis

cialis and diarrhea

cialis 50 mg

buy propecia in the uk

cialis

ordering viagra

cheapest propecia sale uk

cialis daily dosing cost

buy cialis online canada

buy propecia online cheap pharmacy

cialis usa

cialis professional no prescription

best propecia prices

cialis

ordering viagra overnight delivery

cialis 5 mg

5 mg cialis canada

cialis canadian

buying viagra in the us

buy cialis without a prescription

buy propecia uk

buy propecia cheap

cheapest prices on viagra

cialis professional 100 mg

genuine cialis pills

levitra vardenafil

effect of cialis on women

combine cialis and levitra

chip cialis

cheap viagra from uk

cialis c 50

cheap levitra without prescription

cialis brand

buy propecia now

cialis daily dosage pharmacy

levitra from canada

cialis no rx

cialis daily dosing cost

cheapest cialis

buy cialis fedex shipping

levitra discount

china viagra

cialis and canada custom

cheap viagra or cialis

buying online propecia

5 mg original brand cialis

cheap viagra

cialis professional

cialis germany

canadian viagra and healthcare

cheap canadian viagra

cialis arterial fibrillation

cheap levitra prescription

canadian viagra and healthcare

how to get cialis no prescription

best levitra price

cialis alternatives

cheapest propecia pharmacy online

get cialis online

canada online pharmacy propecia

buy viagra mexico

low cost canadian viagra

canadian healthcare

buy viagra online

buy viagra china

cialis buy overnight

cialis okay for women

no prescription propecia

cialis daily canada

levitra tadalafil

cialis canadian cost

buy levitra overnight

cialis order

cheap order prescription propecia

canadian pharmacies cialis

cheap propecia order online

cialis dosage

best price on propecia

buy generic propecia online

buy levitra low price

buy pfizer viagra online

how strong is 5 mg of cialis

natural levitra

cialis 50 mg

cialis en mexico

get cialis online

generic propecia finasteride

indian viagra

cialis fast

cialis soft tablets

cialis on sale

buy viagra germany canadian meds

cheap levitra online us

cialis samples in canada

buy cheap levitra online

indian cialis canada

buy levitra by mail

cialis dosagem

once daily cialis

buying propecia

cialis alternatives

5 mg cialis

cialis price

buy pfizer viagra in canada

buy propecia on line

canadian women viagra

cheap viagra from uk

cialis soft

buy generic levitra online

canadian healthcare pharmacy

canada online pharmacy levitra

cialis discounts

once a day viagra

cialis prices

cialis arterial fibrillation

combine cialis and levitra

buy cialis without prescription

buy propecia now

5 mg daily cialis

cheapest prices for viagra

levitra in uk

generic viagra in canada

buy 5 mg cialis

buying levitra online

levitra canada prescription

online viagra gel to buy

cheap viagra no prescription

buy discount viagra

canada generic propecia

buy propecia prescriptions online

cialis brand name

cialis daily

lowest price for propecia

levitra cheap

healthcare of canada pharmacy

cialis brand only

cheap cialis from india

generic propecia uk

levitra mg

buy viagra germany canadian meds

cialis alternative

cialis from mexico

cialis by women

canada viagra pharmacies scam

cialis for sale

buy cheapest propecia

cialis discount

cialis from canada

buying viagra in canada

levitra lowest price

buying viagra in canada

cialis for women

buy prescription propecia without

cheap levitra online us

cheapest online propecia

cialis without prescription

lowest price propecia costs us

cheap viagra generic

canadian healthcare viagra

fast propecia

buy cialis

express viagra delivery

cialis prescription

buy vardenafil levitra

buy cheap propecia

cialis samples

generic cialis sale

cialis price in canada

drug hair loss propecia

discount generic propecia

cialis from india

buy propecia online from usa pharmacy

buy fast propecia

online pharmacy levitra

canada online pharmacy levitra

buying generic cialis mexico rx

best viagra

cialis brand

online pharmacy propecia renova

buy cialis fedex shipping

buying viagra with no prescription

online propecia prescription

buy cialis canada

info levitra

online cheap viagra

cheap viagra or cialis

buying cialis soft tabs 100 mg

buy propecia online pharmacy

cialis brand

canadian pharmacy viagra legal

buy now propecia

brand viagra over the net