Archive

Posts Tagged ‘cost/benefit’

Readability: we know nothing

June 30th, 2011 5 comments

Readability is one of those terms that developers use and expect other developers to understand while at the same time being unable to define what it is or how it might be measured. I think all developers would agree that their own code is very readable; if only different developers stopped writing code in different ways the issue would go away :-)

Having written a book containing lots of material on cognitive psychology and how it might apply to programming, developers who have advanced beyond “Write code like me and it will be readable” sometimes ask for my perceived expert view on the subject. Unfortunately my expertise has only advanced to the stage of: 1) having a good idea of what research questions need to be addressed, 2) being able to point at experimental results showing that most claimed good readability tips are at best worthless or may even increase cognitive load during reading.

To a good approximation we know nothing about code readability. What questions need to answered to change this situation?

The first and most important readability question is: what is the purpose of looking at the code? Is the code being read to gain understanding (likely to involve ‘slow’ and deliberate behavior) or is the reader searching for some construct (likely to involve skimming; yes, slow and deliberate is more accurate but people make cost/benefit decisions when deciding which strategies to use. The factors involved in reader strategy selection is another important question)?

Next we need to ask what characteristics of developer performance are expected to change with different code organization/layouts. Are we interested in minimizing error, minimizing the time taken to achieve the readers purpose or something else?

What source code attributes play a significant role in readability? Possibilities include the order in which various constructs appear (e.g., should variable definitions appear at the start of a function or close to where they are first used), variable names and the position of tokens relative to each other when viewed by the reader.

Questions involving the relative position of tokens probably generates the greatest volume of discussion among developers. To what extent does visual organization of source code affect reader performance? Fluent reading requires a significant amount of practice, perhaps readable code is whatever developers have spent lots of time reading.

If there is some characteristic of the human visual system that generates a worthwhile benefit to splitting long lines so that a binary operator appears at the {end of the last}/{start of the next} line, will it apply the same way to all developers? We could end up developers having to configure their editor so it displays code in a form that matches the characteristics of their visual system.

How might these ‘visual’ questions be answered? I think that eye tracking will play a large role (“Eyetracking Web Usability” by Jakob Nielsen and Kara Pernice is a good read). At the moment there are technical/usability issues that make this kind of research very difficult. Eye trackers capable of continuously supporting enough resolution to know which character on the screen a developer is looking at (e.g., EyeLink 1000) require that the head be held in a fixed position, while those allowing completely free head movement (e.g., S2 Eye Tracker) don’t yet continuously support the required resolution.

Of course any theory derived from eye tracking experiments will still have to be validated by measuring developer performance on various code snippets.

Share

Has the seed that gets software development out of the stone-age been sown?

December 26th, 2010 1 comment

A big puzzle for archaeologists is why stone age culture lasted as long as it did (from approximately 2.5 millions years ago until the start of the copper age around 6.3 thousand years ago). Given the range of innovation rates seen in various cultures through-out human history a much shorter stone age is to be expected. A recent paper proposes that low population density is what maintained the stone age status quo; there was not enough contact between different hunter gather groups for widespread take up of innovations. Life was tough and the viable lifetime of individual groups of people may not have been long enough for them to be likely to pass on innovations (either their own on ones encountered through contact with other groups).

Software development is often done by small groups that don’t communicate with other groups and regularly die out (well there is a high turn-over, with many of the more experienced people moving on to non-software roles). There are sufficient parallels between hunter gathers and software developers to suggest both were/are kept in a stone age for the same reason, lack of a method that enables people to obtain information about innovations and how worthwhile these might be within a given environment.

A huge barrier to the development of better software development practices is the almost complete lack of significant quantities of reliable empirical data that can be used to judge whether a claimed innovation is really worthwhile. Companies rarely make their detailed fault databases and product development history public; who wants to risk negative publicity and law suits just so academics have some data to work with.

At the start of this decade public source code repositories like SourceForge and public software fault repositories like Bugzilla started to spring up. These repositories contain a huge amount of information about the characteristics of the software development process. Questions that can be asked of this data include: what are common patterns of development and which ones result in fewer faults, how does software evolve and how well do the techniques used to manage it work.

Empirical software engineering researchers are now setting up repositories, like Promise, containing the raw data from their analysis of Open Source (and some closed source) projects. By making this raw data available they are reducing the effort needed by other researchers to investigate their own alternative ideas (I have just started a book on empirical software engineering using the R statistical language that uses examples based on this raw data).

One of the side effects of Open Source development could be the creation of software development practices that have been shown to be better (including showing that some existing practices make things worse). The source of these practices not being what the software developers themselves do or how they do it, but the footsteps they have left behind in the sand.

Share

To if-else-if or if-if, that is the question

August 21st, 2009 2 comments

I am currently measuring if-statements, occurring in visible source, that might be mapped to an equivalent switch-statement. The most obvious usage to look for is a sequence of if-else-if statements that all involve the same expression being tested against an integer constant, as in

if (x == 1)
   stmt_1;
else
   if (x == 2)
      stmt_2;
   else
      if (x == 3)
         stmt_3;

Another possible sequence is:

if (x == 1)
   stmt_1;
if (x == 2)
   stmt_2;
if (x == 3)
   stmt_3;

provided all but the last conditionally executed arms do not change the value of the common control variable (e.g., x).

I started to wonder about what would cause a developer to chose one of these forms over the other. Perhaps the if-if form would be used when it was obvious that the common conditional variable was not modified in the conditionally executed arm. This would imply that there would be more statements in the arms of if-else-if sequences than if-if sequences. The following plot of percentage occurrence (over all detected if-else-if/if-if forms) of line number difference between pars of associated if-statements (e.g., when the controlling expression occurs on line x and the following if-statement controlling expression occurs on line x+2 the distance is 2) shows that this is not the case:

Lines between if-statement controlling expressions

Just over a quarter of the arms contain a single statement (or to be exact the code is contained on a single line); this suggests that when using the if-else-if form most developers put the else and if on the same line. At the next distance along the percentage of if-else-if forms is twice as great as the if-if, probably because of else and if appearing on separate lines (as in the introductory example) in one case and less frequently a comment/blank line in the other. Next along, why the big increase in if-if forms? A comment + blank line, or perhaps no comment or blank line but the use of curly brackets (this is too off the track of where I am supposed to be going to investigate).

This morning I realized why the original plot did not look right, one of the data sets was a way off adding to 100%. An updated version has been uploaded.

It turns out that a single statement (or at least a single line) is more common in the if-else-if form, the opposite of what I had expected. At slightly larger distances there are still differences that can be attributed to else and if appearing on separate lines, curly brackets and a comment/blank line, but the effect is not as large as seen in the original, less accurate, plot.

I have a feeling that I ought to say something about the if-else-if form being preferred to the if-if form. One of the forms will have its behavior changed if the common control variable is modified in one of its arms. But is this an intended or unintended behavior? What is the typical characteristic usage of a common control variable, e.g., do they tend to be accessed but not modified in a given function definition? At the moment I see no obvious cost or benefit strongly favoring one usage over the other, so I will remain silent on the issue.

Share

Recommendations for teachers of programming

December 29th, 2008 No comments

The software developers I deal with usually have at least 3 or more years professional experience in industry (I have not taught a beginners programming class in over 25 years) and these recommendations are based on what I tell these people. I like to think that they are applicable to teaching the3ose with a lot less experience.

First. Application domain expertise generally has greater value than programming expertise. That is, an investment of time in learning the application domain will yield a greater return on investment that learning more programming techniques.

Implications:

  • Minimize the amount of programming knowledge that your students need to know. For instance, there is little to be gained by teaching them the umpteen possible combination of looping constructs that can be written; in practice one or two account for 90% of all usage in industry. Teach the common cases and keep hammering on them until people use them without thought.
  • Criticize students for their failure to understand the application domain not coding constructs they use. For instance, use of loops when other techniques are available is not inefficient coding it is a failure to understand the application domain, anyway (if the application domain involves strings they should think string manipulation not looping over individual characters, if a domain uses vectors use vector arithmetic not loops over individual values, if a domain uses sets think sets manipulation), why are you even thinking about coding efficiency?
  • Set problems that enable students to build on previously learned programming skills to solve different application domain problems.
  • Second. Computer time is cheap, people time is expensive.

    Implications:

  • Code optimization is invariably an inefficient redistribution of time from computers to people. Writing efficient code is an advanced topic best left until after graduation and a year or two in industry. If performance is an issue concentrate on the algorithms.
  • Programming by fitting together chunks of code having a well understood (to the author) behavior can be a fast way of getting programs written (once students have a repertoire of the necessary chunks) and the fact that the code is not the most computer efficient is irrelevant.
  • Developers trying to optimize every bit of code they write is the single biggest cause of poor coding habits that I encounter and these habits tend to be so ingrained that it requires a huge effort to change them. Don’t warp peoples minds when they are at their most impressionable at the start learning to program, just because you were abused is not a reason for you to abuse others.
  • Third. Most software is written to solve an immediate problem, used a few times, and then forgotten about (I know of software projects costing in the millions where the code was never used; my own contribution to never used code is somewhat smaller, but still much larger than I would have liked). The point is that writing software involves a cost/benefit trade-off, how much should I invest in making code maintainable given the risk that the benefits from this investment may not be recouped.

    Implications:

  • If a program is only run a few time it does not really matter how fast it runs, ie, optimization at the code level is pointless and in many cases the algorithm used will not make a lot of difference.
  • A few brief comments specifying input/output, any assumptions made and revision dates (a few seconds typing can save an hour later trying to figure out which is the ‘latest’ version). Anything more is bureaucratic dead weight, usually inserted to keep management happy, rarely correct and a general time sink. Don’t put people off commenting because of the memories they have of the pointless comments their instructor made them write.
  • Fourth. Don’t fool yourself that you are teaching programming; you probably simply give a lecture or two and expect students to pick up the details from one of the assigned course books.

    Implications:

  • Students pick up lots of folklore that gets them through the course; work with this not against it. You are the chief wizard, give them a few powerful principles spells that can be applied to solve the problems they are likely to encounter. Understanding takes more time than is available, accept this fact.
  • Fifth. Coding style does exist and can have an effect on the cost of writing and maintaining code as well as how well a compiler can optimize it. These costs and benefits vary with the context in which the code was created and used. A good programmer will adopt a coding style appropriate to the context, just like an actor plays a role with the appropriate characterization (eg, comedy, thriller, suspense). It takes more experience than most undergraduate have to be able to properly discern what style is being used, let alone be capable of changing their own style.

    Implications:

  • Encouraging students to adopt some coding style is at best a waste of time and often causes people to thoughtlessly adopt inappropriate practices and becoming inflexible in their habits (to the point of ignoring further learning opportunities). Let students experiment and find a method of writing that works for them no matter how much it offends your sense of taste (they probably think the same about your dress sense as you do about their coding style and both issues are equally irrelevant).
  • Warn students that they are likely to encounter people proclaiming the one true way to writing great code and recommend that they ask these self proclaimed gurus for proof (ie, experimental studies published in peer reviewed journals) that using the proffered techniques will result in the claimed outcome.
  • Share
    www.wenn.com
    www.tinynibbles.com