Archive

Posts Tagged ‘comments’

Does the Climategate code produce reliable output?

November 30th, 2009 Derek-Jones No comments

The source of several rather important commercial programs have been made public recently, or to be more exact programs whose output is important (i.e., the Sequoia voting system and code and data from the Climate Research Unit at University of East Anglia the so called ‘Climategate’ leak). While many technical commentators have expressed amazement at how amateurish the programming appears to be, apparently written with little knowledge of good software engineering practices or knowledge of the programming language being used, those who work on commercial projects know that low levels of software engineering/programming competence is the norm.

The emails included in the Climategate leak provide another vivid example, if one were needed, of why scientific data should be made publicly available; scientists are human and are sometimes willing to hide data that does not fit their pet theory or even fails to validate their theory at all.

The Climategate source has only only recently become available and existing technical commentary has been derived from embarassing comments and the usual complaint about not using the right programming language (Fortran is actually a good choice of language for this problem, it is widely used by climatology researchers and a non-professional programmer is probably makes best of their time by using the one language they know tolerably well rather than attempting to use a new language that nobody else in the research group knows).

An important quality indicator of the leaked software was what was not there, test cases (at least I could not find any). How do we know that a program’s output is correct? One way to gain some confidence in a program’s correctness is to process data for which the correct output is known. This blindness to the importance of program level correctness testing is something that I often encounter in people who are subject area experts rather than professional programmers; they believe that if the output has the form they are expecting it must be correct and will sometimes add ‘faults’ to ‘fix’ output that deviates from what they are expecting.

A quick visual scan through the source showed a tale of two worlds, one of single letter identifier names and liberal use of goto, and the other of what looks like meaningful names, structured code and a non-trivial number of comments. The individuals who have contributed to the code base obviously have very different levels of coding ability. Not having written any Fortran in anger for over 15 years my ability to estimate the impact of more subtle coding practices has atrophied.

What kind of faults might a code review look for in these programs? Common coding errors such as using uninitialized variables and incorrect argument passing are obvious choices and their are tools available to check for these kinds of error. A much more insidious kind of error, which requires people with the mathematical expertise to spot, is created by the approximate nature of floating-point arithmetic.

The source is not huge, but not small either, consisting of around 64,000 lines of Fortran and 16,000 lines of IDL (a language designed for interactive data analysis which to my untrained eye looks a lot like MATLAB). There was no obvious support for building the source included within the leaked files (e.g., no makefiles) and my attempt to manually compile using the GNU Fortran compiler failed miserably. So I cannot say anything reliable about the compiler output warnings.

To me the complete lack of test cases implies that the Climategate code does not produce reliable output. Comments in the code such as ***** APPLIES A VERY ARTIFICIAL CORRECTION FOR DECLINE********* suggests that the authors were willing to patch the code to produce output that matched their expectations; this is the mentality of somebody for whom code correctness is not an important issue and if they don’t believe their code is correct then I don’t either.

Source code in itself is rarely that important, although it might have been expensive to create. The real important information in the leaked files is the climate data. Now that this is available others can apply their analysis skills to provide an interpretation to what, if anything statistically reliable, it is telling us.

Using third party measurement data

February 17th, 2009 Derek-Jones No comments

Until today, to the best of my knowledge, all of the source code analysis papers I have read were written by researchers who had control of the code analysis tools they used and had some form of localised access to the source. By control of the code analysis tools I mean that the researchers specified the tool options and had the ability to check the behavior of the tool, in many cases the source of the tool was available to them and often even written by them, and the localised access may have involved downloading lots of code from the web.

I have just been reading about a broad brush analysis of comment usage based on data provided by a commercial code repository that offers API access to some basic code metrics.

At first I was very frustrated by the lack of depth to the analysis provided in the paper, but then I realised that the authors’ intent was to investigate a few broad ideas about comment usage in a large number of projects (around 10,000). The authors complained in their blog about some of the referees comments and having to submit a shorter paper. I can see where the referees are coming from, the papers are lacking in depth of analysis, but they do contain some interesting results.

I was very interested in Figure 2:
Comment density as a function of source code lines in a given commit
which plots the comment density of the lines in a source code commit. I would expect the ratio to be higher for small commits because a developer probably has a relatively fixed amount to say about updates involving a smallish number of lines (which probably fixes a problem). Larger commits are probably updated functionality and so would have a comment density similar to the ‘average’.

The problem with relying on third parties to supply the data is that obtaining the answers to follow up questions invariably involves lots of work, e.g., creating an environment to perform the measurements needed for the follow up questions. However the third party approach can significantly reduce the amount of work needed to get to a point where the interestingness of the results can be gauged.

Unexpected experimental effects

January 16th, 2009 Derek-Jones No comments

The only way to find out the factors that effect developers’ source code performance is to carry out experiments where they are the subjects.  Developer performance on even simple programming tasks can be effected by a large number of different factors.  People are always surprised at the very small number of basic operations I ask developers to perform in the experiments I run.  My reply is that only by minimizing the number of factors that might effect performance can I have any degree of certainty that the results for the factors I am interested in are reliable.

Even with what appear to be trivial tasks I am constantly surprised by the factors that need to be controlled.  A good example is one of the first experiments I ever ran.  I thought it would be a good idea to replicate, using a software development context, a widely studied and reliably replicated human psychological effect; when asked to learn and later recall/recognize a list of words people make mistakes.  Psychologists study this problem because it provides a window into the operation structure of the human memory subsystem over short periods of time (of the order of at most tens of seconds).  I wanted to find out what sort of mistakes developers would make when asked to remember information about a sequence of simple assignment statements (e.g., qbt = 6;).

I carefully read the appropriate experimental papers and had created lists of variables that controlled for every significant factor (e.g., number of syllables, frequency of occurrence of the words in current English usage {performance is better for very common words}) and the list of assignment statements was sufficiently long that it would just overload the capacity of short term memory (about 2 seconds worth of sound).

The results contained none of the expected performance effects, so I ran the experiment again looking for different effects; nothing.  A chance comment by one of the subjects after taking part in the experiment offered one reason why the expected performance effects had not been seen.  By their nature developers are problem solvers and I had set them a problem that asked them to remember information involving a list of assignment statements that appeared to be beyond their short term memory capacity.  Problem solvers naturally look for patterns and common cases and the variables in each of my carefully created list of assignment statements could all be distinguished by their first letter.  Subjects did not need to remember the complete variable name, they just needed to remember the first letter (something I had not controlled for).  Asking around I found that several other subjects had spotted and used the same strategy.  My simple experiment was not simple enough!

I was recently reading about an experiment that investigated the factors that motivate developers to comment code.  Subjects were given some code and asked to add additional functionality to it. Some subjects were given code containing lots of comments while others were given code containing few comments.  The hypothesis was that developers were more likely to create comments in code that already contained lots of comments, and the results seemed to bear this out.  However, closer examination of the answers showed that most subjects had cut and pasted chunks (i.e., code and comments) from the code they were given.  So code the percentage of code in the problem answered mimicked that in the original code (in some cases subjects had complicated the situation by refactoring the code).

The sound of code

January 15th, 2009 Derek-Jones No comments

Speech, it is claimed, is the ability that separates humans from all other animals, yet working with code is almost exclusively based on sight. There are instances of ‘accidental’ uses of sound, e.g., listening to disc activity to monitor a programs process or in days of old the chatter of other mechanical parts.

Various projects have attempted to intentionally make use of sound to provide an interface to the software development process, including:

    People like to talk about what they do and perhaps this could be used to overcome developers dislike of writing comments. Unfortunately automated processing of natural language (assuming the speech to text problem is solved) has not reached the stage where it is possible to automatically detect when the topic of conversation has changed or to figure out what piece of code is being discussed. Perhaps the reason why developers find it so hard to write good comments is because it is a skill that requires training and effort, not random thoughts that happen to come to mind.
    Writing code by talking (i.e., voice input of source code) initially sounds attractive. As a form of input speech is faster than typing, however computer processing of speech is still painfully slow. Another problem that needs to be handled is the large number of different ways in which the same thing can and is spoken, e.g., numeric values. As a method of output reading is 70% faster than listening.

Unless developers have to spend lots of time commuting in person, rather than telecommuting, I don’ see a future for speech input of code. Audio program execution monitoring probably has market is specialist niches, no more.

I do see a future for spoken mathematics, which is something that people who are not a mathematicians might want to do. The necessary formating commands are sufficiently obtuse that they require too much effort from the casual user.

The 30% of source that is ignored

January 3rd, 2009 Derek-Jones No comments

Approximately 30% of source code is not checked for correct syntax (developers can make up any rules they like for its internal syntax), semantic accuracy or consistency; people are content to shrug their shoulders at this this state of affairs and are generally willing to let it pass. I am of course talking about comments; the 30% figure comes from my own measurements with other published measurements falling within a similar ballpark.

Part of the problem is that comments often contain lots of natural language (i.e., human not computer language) and this is known to be very difficult to parse and is thought to be unusable without all sorts of semantic knowledge that is not currently available in machine processable form.

People are good at spotting patterns in ambiguous human communication and deducing possible meanings from it, and this has helped to keep comment usage alive, along with the fact that the information they provide is not usually available elsewhere and comments are right there in front of the person reading the code and of course management loves them as a measurable attribute that is cheap to do and not easily checkable (and what difference does it make if they don’t stay in sync with the code).

One study that did attempt to parse English sentences in comments found that 75% of sentence-style comments were in the past tense, with 55% being some kind of operational description (e.g., “This routine reads the data.”) and 44% having the style of a definition (e.g., “General matrix”).

There is a growing collection of tools for processing natural language (well at least for English). However, given the traditionally poor punctuation used in comments, the use of variable names and very domain specific terminology, full blown English parsing is likely to be very difficult. Some recent research has found that useful information can be extracted using something only a little more linguistically sophisticated than word sense disambiguation.

The designers of the iComment system sensibly limited the analysis domain (to memory/file lock related activities), simplified the parsing requirements (to looking for limited forms of requirements wording) and kept developers in the loop for some of the processing (e.g., listing lock related function names). The aim was to find inconsistencies between the requirements expressed in comments and what the code actually did. Within the Linux/Mozilla/Wine/Apache sources they found 33 faults in the code and 27 in the comments, claiming a 38.8% false positive rate.

If these impressive figures can be replicated for other kinds of coding constructs then comment contents will start to leave the dark ages.

www.wenn.com
FireStats icon Powered by FireStatswww.tinynibbles.com levitra for sale

how to buy cialis in canada

get cialis

bestellen levitra online

cheap propecia no prescription

once a day viagra

cialis online

buy viagra mexico

buy online prescription propecia

lowest propecia prices

buy propecia online from usa pharmacy

cialis 5 mg buy

canadian propecia rx

cialis en mexico

healthcare canadian pharmacy

cialis and canada custom

buy cialis online canada

canada online pharmacy levitra

cialis discount

buy generic propecia

cialis daily in canada

levitra buy online

cialis fast delivery

order viagra or levitra

cialis and ketoconazole

buy prescription propecia without

indian viagra

brand name cialis

levitra where to buy

generic propecia finasteride

buy dosages levitra

cheap cialis soft

buy viagra

cialis price

buy propecia online prescription

cost of daily cialis

buy propecia where

cialis next day

cheap cialis

generic propecia online pharmacy

cialis discounts

cheapest viagra online

getting cialis from canada

buy 5 mg cialis

cialis by mail

buy cialis 5 mg

canadian pharmacy discount code viagra

drug generic propecia

once daily cialis

canada meds viagra

info levitra

canadian pharmacy viagra

cheap viagra online

cialis buy overnight

ordering propecia online

brand cialis for sale

cheapest propecia sale uk

cost of propecia

cialis no prescription

cialis 100 mg

levitra low price

brand viagra over the net

buy propecia online pharmacy

lowest cost levitra

buy cialis for daily use

cialis from canada

levitra online overnight delivery

buy branded viagra

generic levitra vardenafil

order propecia

generic viagra made in usa

get levitra

cialis 50 mg

cialis professional 100 mg

buy propecia on line

levitra online no prescription

cheapest propecia prescription

discount levitra purchase

levitra in india

online pharmacy propecia renova

buy levitra uk

online cheap viagra

generic propecia alternative

buy viagra online

ordering cialis gel

cialis one a day

generic levitra canada

lowest propecia prices in canada

cheapest price propecia cheap

levitra order prescription

buy viagra on line

discount propecia propecia

low cost canadian viagra

canadian pharmacies cialis

cheap propecia online

cialis to buy

best price for propecia

discount propecia online

cialis alternative

for sale levitra

buy levitra online from canada

cialis uk

5 mg original brand cialis

buy now propecia

low cost levitra

cialis quick shipment

brand viagra professional

levitra online prescription

buy propecia cheap

buy cheap levitra online

buy cialis fedex shipping

canadian pharmacy cialis

cheap levitra prescription

cialis purchase

levitra viagra cialis

generic viagra made in india

cheapest viagra usa

generic viagra 100 mg

buying cialis next day delivery

buy levitra vardenafil

generic propecia 5mg

how to get viagra

cialis fast delivery usa

levitra online

generic cialis sale

generic propecia fda approved

buy cialis in usa

cheap levitra

generic propecia for sale

order cheapest propecia online

buy propecia canada

cheap levitra tablets

cialis strenght mg

cost levitra low

genuine cialis pills

discount us propecia

daily dosage cialis

buying propecia

cialis in mexico

buy viagra germany canadian meds

levitra prescription

hydrochlorothiazide cialis

cialis woman

cialis dosage mg

levitra tabs

best price levitra

cialis 5 mg

buying cialis soft tabs 100 mg

mexico levitra

discount propecia rx

order cheap levitra

levitra online sales

buy propecia now

buy cialis canada

does generic cialis work

cheapest propecia uk

cialis soft pills

brand name cialis overnight

buying viagra in canada

buy levitra online no prescription

buy propecia without prescription

levitra 10mg

lowest price for propecia

order levitra online

buy propecia online

buy generic viagra india rx

canada online pharmacy propecia

buy real cialis

generic viagra india

online propecia prescriptions

generic propecia sale

how much cialis

buy viagra without prescription

cialis transdermal

generic viagra canadian

buy viagra online cheap us

indian cialis generic

generic viagra canada

lowest price propecia best

discount generic propecia

obtain viagra without prescription

cialis headaches

lowest price levitra

online pharmacy propecia viagra

cost of viagra

buy propecia prescriptions online

buy viagra china

cialis tablets foreign

bio viagra herbal

cheap levitra without prescription

buy real viagra online

levitra online us

cialis generic 100 mg

fda levitra

discount cialis india

canada viagra pharmacies scam

cialis 20 mg

cialis daily

buy cialis cannada

cialis for woman

how to get cialis in canada

cialis professional no prescription

generic levitra online

online viagra gel to buy

overnight delivery cialis

how strong is 5 mg of cialis

cialis refractory

buy cheapest propecia

buying generic cialis mexico rx

cialis price 100 mg

internet pharmacy propecia

cialis prescription

levitra in canada

name brand cialis

cheap propecia 5mg

lowest price on non generic levitra

lowest price propecia

levitra sales uk

online cialis

buying online propecia

best price for generic cialis

canada viagra generic

online generic cialis 100 mg

canadian online pharmacy cialis

cheap viagra from uk

how much does cialis cost

canada cheap propecia

order generic levitra

buy discount viagra

cheap discount levitra

canadian healthcare viagra

cialis delivered overnight

buy cialis usa

cheap propecia online prescription

levitra cheap fast

buy levitra overnight

buy generic cialis

indian generic levitra

buy levitra us

get propecia online pharmacy

buying levitra online

combine cialis and levitra

buy canada levitra

cheap fast levitra

mail online order propecia

buy can from i propecia who

cialis from mexico

canadian viagra india

how much to buy viagra in pounds

levitra now online

get cialis online

gele viagra

buy cheap levitra

generic cialis soft tabs

generic viagra online

cialis vs levitra

low price levitra

i need to buy propecia

online levitra

canada propecia prescription

cheap levitra uk

50 mg cialis

cheap canadian viagra

generic cialis next day shipping

can i get viagra in mexico

buying cialis

cialis overnight

canada viagra

discount levitra rx

online propecia prescription

natural viagra

buy fast propecia

mail order levitra

discount levitra online

cheapest prices for viagra

how much is viagra

discount drug propecia

cialis next day delivery

female viagra pills

cheap cialis from india

generic levitra purchase

levitra next day delivery

canadian drugs propecia

canadian healthcare pharmacy

china viagra

cialis on women

mexico pharmacy cialis

lowest priced propecia

levitra pill

canadian viagra 50mg

cialis profesional

buy cheap generic levitra

canadian healthcare

buy cialis once daily

levitra mg

generic propecia effective

canada levitra

generic cialis from india

order cheap propecia

next day viagra