Archive

Posts Tagged ‘faults’

The changing shape of code in the next decade

December 29th, 2009 Derek-Jones No comments

I think there are two forces that will have a major impact on the shape of code in the next decade:

  • Asian developers. China and India each have a population that is more than twice as large as Europe and the US combined, and software development has been kick started in these countries by a significant amount of IT out sourcing. I have one comparative data point for software developers who might be of the hacker ilk. A discussion of my C book on a Chinese blog resulted in a download volume that was 50% of the size of the one that occurred when the book appeared as a news item on Slashdot.
  • Scripting languages. Software is written to solve a problem and there are only so many packaged applications (COTS or bespoke) that can profitably be supported. Scripting languages are generally designed to operate within one application domain, e.g., Bash, numerical analysis languages such as R and graphical plotting languages such as gnuplot.

While markup languages are very widely used they tend to be read and written by programs not people.

Having to read code containing non-alphabetic characters is always a shock the first time. Simply having to compare two sequences of symbols for equality is hard work. My first experience of having to do this in real time was checking train station names once I had traveled outside central Tokyo and the names were no longer also given in Romaji.

其中,ul分别是bootmap_size(bit map的size),start_pfn(开始的页框)
                                max_low_pfn(被内核直接映射的最后一个页框的页框号) ;

Developers based in China and India have many different cultural conventions compared to the West (and each other) and I suspect that these will effect the code they write (my favorite potential effect involves treating time vertically rather than horizontally). Many coding conventions used by a given programming language community exist because of the habits adopted by early users of that language, these being passed on to subsequent users. How many Chinese and Indian developers are being taught to use these conventions, are the influential teachers spreading different conventions? I don’t have a problem with different conventions being adopted other than that having different communities using different conventions increases the cost for one community to adopt another community’s source.

Programs written in a scripting language tend to be much shorter (often being contained within a single file) and make use of much more application knowledge than programs written in general purpose languages. Their data flow tends to be relatively simple (e.g., some values are read/calculated and passed to a function that has some external effect), while the relative complexity of the control flow seems to depend on the language (I only have a few data points for both assertions).

Because of their specialized nature most scripting languages will not have enough users to support any kind of third party support tool market, e.g., testing tools. Does this mean that programs written in a scripting language will contain proportionally more faults? Perhaps their small size means that only a small number of execution paths are possible and these are quickly exercised by everyday usage (I don’t know of any research on this topic).

Information content of expressions

December 11th, 2009 Derek-Jones No comments

Software developers read source code to obtain information. How might the information content of source code be quantified?

Both of the following functions assign the same value to x and if that is the only information a reader of that code is interested in, then the information content of both assignment statements could be said to be the same.

int foo(void)
{
x = 5;
...
}
 
int bar(void)
{
x = 2 + 3;
...

A reader seeking deeper understanding of the above code would ask why the value 5 is built from two values in bar. One reason might be that the author of the function wanted to explicitly call out background information about how the value 5 was derived (this is often done using symbolic names, but the use of literals themselves is sometimes encountered). Perhaps the author of foo did not see the need to expose this information or perhaps the shared value is purely coincidental.

If the two representations denote the same quantity doesn’t the second have a greater information content for a reader seeking deeper understanding?

In the following example:

... x + y & z ...
 
...
 
... num_red + num_white & lower_bits ...

an experienced developer with a knowledge of English is likely to interpret the expression as adding the number of occurrences of two quantities and using bit-wise AND to extract the lower bits. For some readers the second expression has a higher information content. Would use of the names number_of_red further increase the information content?

In the following example the first expression has not added any information that was not already present in the first expression above (except perhaps that the author was not certain of the precedence or perhaps did not expect subsequent readers to be certain).

... ( x + y ) & z ...
 
...
 
... x + ( y & z ) ...

The second expression uses parenthesis to achieve an operand/operator binding that is different from the default. Has this changed the information content of the expression?

There is experimental evidence that developers extract information from the names of variables to help them make decisions about operator precedence. To me the name all_32_bits_one suggests a sequence of bits and I would expect such a representation to be associated with the bit-wise AND operator, not binary plus. With no knowledge of the relative precedence of the two operators in the following expression the name of the middle operand would cause me to misinterpret the code. Does this change the information content of the expression? Does knowledge of the experimental evidence and the correct operator precedence change the information content (i.e., there is a potential fault in the code because the author may have assumed the incorrect precedence)?

... num_red + all_32_bits_one & sign_bit ...

There is experimental evidence that people use the amount of whitespace appearing between operands and their operators to visually highlight operator precedence

The relative quantities of whitespace used in the following two expressions appear to tell very different stories. Do the two expressions have a different information content?

... x  +  y & z ...
 
...
 
... x + y  &  z ...

The idea of measuring the information content of source code is very enticing. However, an accurate measure requires knowledge of the kind of information a reader is trying to obtain and of information that already exists in their brain.

Another question is the easy with which information can be extracted from code. Something that might be labeled as readability, except that readability has connotations of there being an abundant supply of information to extract.

Does the Climategate code produce reliable output?

November 30th, 2009 Derek-Jones 2 comments

The source of several rather important commercial programs have been made public recently, or to be more exact programs whose output is important (i.e., the Sequoia voting system and code and data from the Climate Research Unit at University of East Anglia the so called ‘Climategate’ leak). While many technical commentators have expressed amazement at how amateurish the programming appears to be, apparently written with little knowledge of good software engineering practices or knowledge of the programming language being used, those who work on commercial projects know that low levels of software engineering/programming competence is the norm.

The emails included in the Climategate leak provide another vivid example, if one were needed, of why scientific data should be made publicly available; scientists are human and are sometimes willing to hide data that does not fit their pet theory or even fails to validate their theory at all.

The Climategate source has only only recently become available and existing technical commentary has been derived from embarassing comments and the usual complaint about not using the right programming language (Fortran is actually a good choice of language for this problem, it is widely used by climatology researchers and a non-professional programmer is probably makes best of their time by using the one language they know tolerably well rather than attempting to use a new language that nobody else in the research group knows).

An important quality indicator of the leaked software was what was not there, test cases (at least I could not find any). How do we know that a program’s output is correct? One way to gain some confidence in a program’s correctness is to process data for which the correct output is known. This blindness to the importance of program level correctness testing is something that I often encounter in people who are subject area experts rather than professional programmers; they believe that if the output has the form they are expecting it must be correct and will sometimes add ‘faults’ to ‘fix’ output that deviates from what they are expecting.

A quick visual scan through the source showed a tale of two worlds, one of single letter identifier names and liberal use of goto, and the other of what looks like meaningful names, structured code and a non-trivial number of comments. The individuals who have contributed to the code base obviously have very different levels of coding ability. Not having written any Fortran in anger for over 15 years my ability to estimate the impact of more subtle coding practices has atrophied.

What kind of faults might a code review look for in these programs? Common coding errors such as using uninitialized variables and incorrect argument passing are obvious choices and their are tools available to check for these kinds of error. A much more insidious kind of error, which requires people with the mathematical expertise to spot, is created by the approximate nature of floating-point arithmetic.

The source is not huge, but not small either, consisting of around 64,000 lines of Fortran and 16,000 lines of IDL (a language designed for interactive data analysis which to my untrained eye looks a lot like MATLAB). There was no obvious support for building the source included within the leaked files (e.g., no makefiles) and my attempt to manually compile using the GNU Fortran compiler failed miserably. So I cannot say anything reliable about the compiler output warnings.

To me the complete lack of test cases implies that the Climategate code does not produce reliable output. Comments in the code such as ***** APPLIES A VERY ARTIFICIAL CORRECTION FOR DECLINE********* suggests that the authors were willing to patch the code to produce output that matched their expectations; this is the mentality of somebody for whom code correctness is not an important issue and if they don’t believe their code is correct then I don’t either.

Source code in itself is rarely that important, although it might have been expensive to create. The real important information in the leaked files is the climate data. Now that this is available others can apply their analysis skills to provide an interpretation to what, if anything statistically reliable, it is telling us.

Software maintenance via genetic programming

November 27th, 2009 Derek-Jones 1 comment

Genetic algorithms have been used to find solution to a wide variety of problems, including compiler optimizations. It was only a matter of time before somebody applied these techniques to fixing faults in source code.

When I first skimmed the paper “A Genetic Programming Approach to Automated Software Repair” I was surprised at how successful the genetic algorithm was, using as it did such a relatively small amount of cpu resources. A more careful reading of the paper located one very useful technique for reducing the size of the search space; the automated software repair system started by profiling the code to find out which parts of it were executed by the test cases and only considered statements that were executed by these tests for mutation operations (they give a much higher weighting to statements only executed by the failing test case than to statements executed by the other tests; I am a bit surprised that this weighting difference is worthwhile). I hate to think of the amount of time I have wasted trying to fix a bug by looking at code that was not executed by the test case I was running.

I learned more about this very interesting system from one of the authors when he gave the keynote at a workshop organized by people associated with a source code analysis group I was a member of.

The search space was further constrained by only performing mutations at the statement level (i.e., expressions and declarations were not touched) and restricting the set of candidate statements for insertion into the code to those statements already contained within the code, such as if (x != NULL) (i.e., new statements were not randomly created and existing statements were not modified in any way). As measurements of existing code show most uses of a construct are covered by a few simple cases and most statements are constructed from a small number of commonly used constructs. It is no surprise that restricting the candidate insertion set to existing code works so well. Of course no fault fix that depends on using a statement not contained within the source will ever be found.

There is ongoing work looking at genetic modifications at the expression level. This
work shares a problem with GA driven test coverage algorithms; how to find ‘magic numbers’ (in the case of test coverage the magic numbers are those that will cause a controlling expression to be true or false). Literals in source code, like those on the web, tend to follow a power’ish law but the fit to Benford’s law is not good.

Once mutated source that correctly processes the previously failing test case, plus continuing to pass the other test cases, has been generated the code is passed to the final phase of the automated software repair system. Many mutations have no effect on program behavior (the DNA term intron is sometimes applied to them) and the final phase removes any of the added statements that have no effect on test suite output (Westley Weimer said that a reduction from 50 statements to 10 statements is common).

Might the ideas behind this very interesting research system end up being used in ‘live’ software? I think so. There are systems that operate 24/7 where faults cost money. One can imagine a fault being encountered late at night, a genetic based system fixing the fault which then updates the live system, the human developers being informed and deciding what to do later. It does not take much imagination to see the cost advantages driving expensive human input out of the loop in some cases.

An on-going research topic is the extent to which a good quality test suite is needed to ensure that mutated fault fixes don’t introduce new faults. Human written software is known to often be remarkably tolerant to the presence of faults. Perhaps ensuring that software has this characteristic is something that should be investigated.

Where are the dead bodies?

November 18th, 2009 Derek-Jones 5 comments

The possibility of faults in software causing death or serious injury is often talked about and in some cases large amounts of money are invested in work to reduce the possibility of these events occurring (or at least doing things that will support the view that a company took reasonable precautions, should a case end up in court). The Therac-25 accidents are an often quoted example of a software fault that directly resulted in deaths. These accidents occurred over a 19 month period in the mid 1980s and are believed to have resulted in the death of six people. I don’t wish to disrespect the memory of the people who died, but six people 20 years ago; is that it? Less than the number of people killed every day (around 10) in traffic accidents in the UK.

If faults in software really do have a non-trivial impact on human safety then we would expect this fact to be reflected in accident statistics. After searching the accident statistics for the UK I cannot find any whose cause is directly attributed to software. If there are people who have died as a direct result of faults in software, the death rate has not yet reached the minimum level needed to be recorded as such (or are these deaths ‘hidden’ away in ones and twos within other causes?)

The US National Transportation Safety Board carries out a thorough investigation of all US aviation accidents. Searching the Aviation Accident Database on the query “software” between the dates 1 Jan 2000 and 9 Aug 2005 returns 44 matches. Reading these 44 reports I did not find any accident attributed to a software related issue.

If faults in software are not killing or seriously injuring many people why is so much effort invested in reducing the probability of these events occurring? The following are some of the possibilities:

  • The investment actually made is small, but it is talked up.
  • The investment is made for economic reasons (e.g., more reliable products are likely to reduce support costs) and increased ’safety’ is a side effect.
  • In situations where there is a likelihood of death or serious injury the procedures and reliability of non-software items is sufficient to short-circuit the effects of any life threatening faults that may exist in the software used (at least until the fault can be corrected).

As any developer knows, replicating faulty behavior in software can be very difficult, if not impossible. It may be that software faults are not given as the root cause of death or serious injury because the necessary proof is not available. Or perhaps software faults have yet to be the root cause of such events on any non-trivial scale.

Existing practice affects what people are willing to put up with. Many users of Microsoft Windows now accept that it is necessary to reboot the computer they are using on a daily, or even hourly, basis. Users of cars accept that the tool they are using can result in serious injuries or even death (usually rating nothing more than a story in the local town newspaper). Will there be a public hue and cry once software faults start to be recorded as a primary factor in accidental death or serious injury? As this paper shows, it can take a lot of dead bodies before existing practices are changed.

The lack of dead bodies attributed to a software root cause suggests that it is very still early days for the field of high integrity software development.

This material was originally written in 2005 and appeared in an earlier blog of mine which I did not keep up.

The 30% of source that is ignored

January 3rd, 2009 Derek-Jones No comments

Approximately 30% of source code is not checked for correct syntax (developers can make up any rules they like for its internal syntax), semantic accuracy or consistency; people are content to shrug their shoulders at this this state of affairs and are generally willing to let it pass. I am of course talking about comments; the 30% figure comes from my own measurements with other published measurements falling within a similar ballpark.

Part of the problem is that comments often contain lots of natural language (i.e., human not computer language) and this is known to be very difficult to parse and is thought to be unusable without all sorts of semantic knowledge that is not currently available in machine processable form.

People are good at spotting patterns in ambiguous human communication and deducing possible meanings from it, and this has helped to keep comment usage alive, along with the fact that the information they provide is not usually available elsewhere and comments are right there in front of the person reading the code and of course management loves them as a measurable attribute that is cheap to do and not easily checkable (and what difference does it make if they don’t stay in sync with the code).

One study that did attempt to parse English sentences in comments found that 75% of sentence-style comments were in the past tense, with 55% being some kind of operational description (e.g., “This routine reads the data.”) and 44% having the style of a definition (e.g., “General matrix”).

There is a growing collection of tools for processing natural language (well at least for English). However, given the traditionally poor punctuation used in comments, the use of variable names and very domain specific terminology, full blown English parsing is likely to be very difficult. Some recent research has found that useful information can be extracted using something only a little more linguistically sophisticated than word sense disambiguation.

The designers of the iComment system sensibly limited the analysis domain (to memory/file lock related activities), simplified the parsing requirements (to looking for limited forms of requirements wording) and kept developers in the loop for some of the processing (e.g., listing lock related function names). The aim was to find inconsistencies between the requirements expressed in comments and what the code actually did. Within the Linux/Mozilla/Wine/Apache sources they found 33 faults in the code and 27 in the comments, claiming a 38.8% false positive rate.

If these impressive figures can be replicated for other kinds of coding constructs then comment contents will start to leave the dark ages.

Why is code so fault tolerant?

December 22nd, 2008 Derek-Jones No comments

All professional developers eventually encounter a program containing a fault that appears to be so devastating that the program could not possibly perform its intended task, yet the program has been and continues to function more or less as expected.  In my case the program was a cpu instruction set emulator (for a Z80 written in Fortran) that I had written and the fault was a copy-and-past editing mistake that resulted in one of the subtract instructions behaving like the equivalent addition instruction.  The emulator was used to  execute CP/M and various applications (on a minicomputer that did not have any desktop office applications).  I was astounded that CP/M booted and appeared to work correctly, along with various applications (apart from the one exhibiting behavior differences that resulted in me tracking down this fault).

My own continuing experience with apparently fatal faults, in mine and other peoples code, lead me to the conclusion that researchers should be putting most of their effort into trying to figure out why so much software does such a good job of behaving in an acceptable manner while containing so many faults (of various apparent seriousness).  Proving software correctness is an expensive and time consuming dead-end for all but a few specialist applications.

One way for developers to vividly see how robust most software is to random faults is to use a mutation tool on the source.  Such tools introduce faults into code with the aim of checking the thoroughness of a set of test cases.  It is a sobering experience to see how many mutations fail to have any noticeable effect on a programs external behavior.

One group of researchers took this mutation idea to an extreme by changing all less-than operators in for-loops into less-than-or-equals operators. They found that only a handful of the changes prevented the recompiled programs being at all useful to users. While some of the changes produced output that was obviously incorrect, it was still possible to use much of the original functionality.

What is it about the shape of most code that allows it to continue to function in the presence of faults? It is time faults were acknowledged as a fact of life in all actively developed systems and that we should concentrate on developing techniques to help ensure that software containing them continues to behave as intended, rather than the unsophisticated zero-tolerance approach that has held sway for so long.

www.wenn.com
FireStats icon Powered by FireStatswww.tinynibbles.com buy viagra china

cialis in mexico

cialis for woman

generic propecia online pharmacy

canada online pharmacy levitra

online pharmacy propecia renova

order viagra or levitra

best price for generic cialis

generic propecia effective

buy dosages levitra

cialis vs levitra

cheapest propecia prescription

ordering propecia online

cheap cialis

levitra online no prescription

mail online order propecia

cialis refractory

cialis 100 mg

china viagra

buy can from i propecia who

brand viagra over the net

get levitra

discount levitra online

next day viagra

order generic levitra

lowest price on non generic levitra

canadian viagra 50mg

canadian pharmacy viagra

get cialis online

cheap discount levitra

drug generic propecia

name brand cialis

cialis headaches

how to buy cialis in canada

cialis fast delivery usa

cialis tablets foreign

cialis 50 mg

generic viagra canadian

levitra in canada

cialis tablets

generic viagra made in india

buy cialis without prescription

discount propecia rx

buy real viagra online

cheap fast levitra

buy propecia generic

how much does cialis cost

once daily cialis

cheap viagra canada or india

levitra viagra cialis

cialis quick shipment

cheap prescription propecia

levitra where to buy

buy propecia prescriptions online

cialis generic 100 mg

order cheapest propecia online

cialis no prescription

buy cialis cannada

mexico levitra

cialis from mexico

buy levitra online viagra

cheap propecia online

cialis delivered overnight

online viagra gel to buy

bestellen levitra online

cialis one a day

best way to use cialis

low price levitra

lowest price levitra

buying online propecia

can i get viagra in mexico

order propecia

5 mg daily cialis

buy levitra uk

cheap propecia no prescription

levitra viagra online

discount cialis india

buy generic propecia

buy viagra on line

canadian pharmacy cialis

female viagra pills

best price cialis

cialis professional 20 mg

canadian drugs propecia

buy prescription propecia without

low cost propecia

for sale levitra

levitra cost

cialis overnight delivery

cialis transdermal

canada online pharmacy propecia

buying levitra online

cialis and ketoconazole

buy propecia online

obtain viagra without prescription

levitra for sale

natural viagra

buy cialis fedex shipping

buy cialis canada

overnight delivery viagra

canada viagra pharmacies scam

best price propecia

mail order levitra

levitra prescription

levitra cheap fast

cialis price 100 mg

levitra canadian

generic levitra online

info levitra

cialis price in canada

discount generic propecia

levitra next day delivery

buying generic propecia

generic propecia finasteride

cialis en mexico

buy cheap levitra online

buy propecia cheap

canada levitra

lowest price propecia

mexico pharmacy cialis

buy generic levitra

buy propecia now

get cialis

cheap levitra tablets

indian cialis generic

buy propecia online pharmacy

cialis by mail

levitra online sales

cheap levitra uk

cialis uk

discount propecia online

order prescription propecia

lowest price for propecia

how to get cialis in canada

buy levitra online from canada

cialis professional no prescription

levitra mg

canadian viagra and healthcare

buy canada levitra

brand name cialis

levitra online us

cialis alternative

buying cialis soft tabs 100 mg

cialis woman

cialis next day

cialis online

buy viagra without prescription

buy branded viagra

discount us propecia

lowest propecia prices

buy cialis 5 mg

indian cialis

buy propecia in the uk

cialis professional 100 mg

cheap viagra from uk

buy propecia canada

cheap viagra online

cheapest price propecia cheap

buy levitra overnight

generic propecia fda approved

cheapest viagra online

5 mg original brand cialis

buy propecia where

cost of viagra

buy cialis in usa

online generic cialis 100 mg

buy viagra

cialis next day delivery

cialis buy overnight

cheap levitra

generic levitra vardenafil

buy levitra online no prescription

buy generic cialis

cialis profesional

cialis purchase

buy propecia online prescription

cheap levitra prescription

levitra from canadian pharmacy

canadian pharmacy

buy real cialis

buy fast propecia

how to get viagra

levitra 10mg

buying propecia

online propecia prescriptions

cialis from canada

lowest propecia 1 mg

generic levitra canada

gele viagra

cialis prescription

buying propecia online

online propecia prescription

cialis and diarrhea

internet pharmacy propecia

generic viagra 100 mg

cialis to buy

online levitra

cialis 5 mg italia

buy viagra germany canadian meds

lowest propecia prices in canada

levitra now online

online ordering propecia

cheapest prices for viagra

generic propecia alternative

cialis and canada custom

levitra online prescription

buy online prescription propecia

generic viagra online

hydrochlorothiazide cialis

buy levitra vardenafil

cheap propecia uk

getting cialis from canada

best price levitra

generic cialis sale

canadian propecia rx

overnight delivery cialis

best price for propecia

cialis cheap

order levitra online

cialis overnight

levitra mail order

cialis discounts

brand name cialis overnight

canada cheap propecia

discount levitra rx

best viagra

generic propecia 5mg

buy propecia online from usa pharmacy

cialis daily in canada

cialis strenght mg

order cheap propecia

buying viagra in canada

canadian viagra india

discount propecia propecia

50 mg cialis

daily dosage cialis

online cialis

discount drug propecia

cheap order prescription propecia

buy discount viagra

buy cheap generic levitra

canada generic propecia

bio viagra herbal

cialis fast delivery

lowest cost levitra

levitra discount

lowest price propecia best

levitra low price

levitra sales uk

lowest priced propecia

cheap propecia 5mg

low cost levitra

buying cialis next day delivery

best price generic propecia

buy cheap generic propecia

cheap canadian viagra

cheap propecia online prescription

canadian healthcare pharmacy

canada propecia prescription

buy viagra online

cialis 100 mg generic

generic cialis from india

brand viagra professional

cialis 20 mg

buy levitra us

cheapest propecia sale uk

cost levitra low

discount levitra purchase

cialis 5 mg buy

buy cheap levitra

cheap levitra without prescription

online pharmacy propecia viagra

cheapest viagra usa

levitra in india

buy cialis online uk

cialis discount

combine cialis and levitra

buy propecia on line

once a day viagra

buy now propecia

buy viagra online cheap us

cialis on women

ganeric cialis

buy cialis once daily

cheapest overnight cialis

cheapest viagra

cialis 5 mg

indian generic levitra

cialis daily

i need to buy propecia

cialis dosage mg

cialis soft pills

levitra online overnight delivery

generic propecia for sale

online propecia uk

levitra buy online

generic levitra cheap

cheap cialis soft

indian viagra

how much cialis