Archive

Archive for July, 2014

Success: Software engineering data is starting to become very dull

July 31, 2014 No comments

A few years ago it was unusual for the author(s) of a paper in software engineering to make their data public (and on top of that it was rare to encounter a paper that actually made use of empirical data). The situation now is that I am having trouble keeping up with all the papers that include a link to downloadable data. Part of the problem is that I will pay a lot more attention to papers that come with data, having lived through a long famine I have not yet adjusted to the greater abundance. I’m sure that journal editors and referees are in the same boat and are being lured by accompanying data to accept paper for publication that they would otherwise have rejected.

This growing quantity of empirical software engineering data means we can now start thinking about what data is useful to have and what data is not so useful. Data is useful if it highlights a pattern of behavior that can be used to help reduce the resources needed to create/maintain software.

To get a handle on estimating data usefulness we need a model of research in software engineering. While many have used Physics as the model for software engineering research (i.e., a few simple universal laws that apply everywhere), I think Biology is a much better fit.

Software is written in different habitats environments (e.g., small teams, large teams) and targets different habitats environments (e.g., embedded, desktop, mobile, supercomputer) using different techniques and driven by different predators/prey market forces (e.g., release first/quickly, be reliable). Yes there are common drivers, just as the living things studied by biologists share a common need to eat, sleep and reproduce.

Like biology, the bulk of software engineering research is about the study of niche topics, with some small percentage of researchers trying to build theories that tie everything together at one level or another to create bigger pictures.

This model of software engineering research means estimating the usefulness of data probably requires some knowledge of the niche to which it applies. It also means that a particular data set might not be useful yet because it needs to be combined with other data, that does not yet exist (perhaps it was collected first because it was easier to do).

So in a space of a few years most software engineering data has gone from being very interesting (because it is rare) to being very dull (because it is harder to stand out in a crowd).

An ISO Standard for R (just kidding)

July 24, 2014 4 comments

IST/5, the British Standards’ committee responsible for programming languages in the UK, has a new(ish) committee secretary and like all people in a new role wants to see a vision of the future; IST/5 members have been emailed asking us what we see happening in the programming language standards’ world over the next 12 months.

The answer is, off course, that the next 12 months in programming language standards is very likely to be the same as the previous 12 months and the previous 12 before that. Programming language standards move slowly, you don’t want existing code broken by new features and it would be a huge waste of resources creating a standard for every popular today/forgotten tomorrow language.

While true the above is probably not a good answer to give within an organization that knows its business intrinsically works this way, but pines for others to see it as doing dynamic, relevant, even trendy things. What could I say that sounded plausible and new? Big data was the obvious bandwagon waiting to be jumped on and there is no standard for R, so I suggested that work on this exciting new language might start in the next 12 months.

I am not proposing that anybody start work on an ISO standard for R, in fact at the moment I think it would be a bad idea; the purpose of suggesting the possibility is to create some believable buzz to suggest to those sitting on the committees above IST/5 that we have our finger on the pulse of world events.

The purpose of a standard is to create agreement around one way of doing things and thus save lots of time/money that would otherwise be wasted on training/tools to handle multiple language dialects. One language for which this worked very well is C, for which there were 100+ incompatible compilers in the early 1980s (it was a nightmare); with the publication of the C Standard users finally had a benchmark that they could require their suppliers to meet (it took 4-5 years for the major suppliers to get there).

R is not suffering from a proliferation of implementations (incompatible or otherwise), there is no problem for an R standard to solve.

Programming language standards do get created for reasons other than being generally useful. The ongoing work on C++ is a good example of consultant driven standards development; consultants who make their living writing and giving seminars about the latest new feature of C++ require a steady stream of new feature to talk about and have an obvious need to keep new versions of the standard rolling down the production line. Feeling that a language is unappreciated is another reason for creating an ISO Standard; the Modula-2 folk told me that once it became an ISO Standard the use of Modula-2 would take off. R folk seem to have a reasonable grip on reality, or have I missed a lurking distorted view of reality that will eventually give people the drive to spend years working their fingers to the bone to create a standard that nobody is really that interested in?

My first day developing for Google Glass

July 19, 2014 1 comment

I was at the Google Glass Design Sprint & Workshop in London today. I don’t own a Google Glass and applied for one of the limited spaces available to developers who would be lent hardware for the day. Any idea I was harboring of Google recognizing me as an ace hackathon attendee were dashed at the start when we were told that the available slots had been filled by a random draw of applicants.

Vendor presentations at the start of hackathons tend to be either deadly dull or eye opening. Timothy Jordan explained why software written for Google Glass were not Apps, or rather should not be written with this mindset, but needed to be thought of in terms of enhancing the user’s experience in real time the moment; this really clicked with me. He also made some excellent points on user interface issues specific to the glass form factor which I think went over the head of most people present (this really needed its own slot).

I turned up with an App user enhancement experience reasonably well formed in my mind. The idea was to port the numbers tool to Android and have it scan the incoming camera image for numbers, information about the interesting ones being spoken into the users ear (e.g., that number of there is the rest mass of the electron).

On the day Google handled out a half a dozen brief biographies of potential Glass users and asked us to come up with ideas for software to enhance the lives of these people. I came up with the idea for helping the triathlete on the cycling leg of his competition. Having watched highlights from the Tour de France I knew that corners on the downhill stages of mountain routes presented a significant problem to riders traveling at up to 65 mph, i.e., how hard should they break to get safely around a corner whose curvature they could not see. My idea was for the corner curvature user experience to come to life when the riders speed exceeded, say, 45 mph and displayed a simple colored wiggly line that represented what lies around the bend.

Listening to other people at my table and in other groups I was surprised at how many were designing their idea as an App; that is, they wanted user to select from drop down menus and/or specify various numeric/literal values. My pointing out that they were designing Apps was met with blank stares.

Progress on writing actual code was hampered by lunch, having to leave at 17:30 and adb not working out of the box under Windows (this prevented any communication between the Android SDK running on Windows and Google glass). It took a while to figure out that the problem was adb/Windows (the Google folk had no idea it did not work since they all used Linux or Apple Macs). As usual an answer on Stackoverflow explained what changes needed to be made to the Google software. Asking around uncovered a few people with horror stories to tell about getting adb communication under Windows.

Microsoft Windows has significantly slipped in developer tool mind share over the last few years (I am even thinking of buying my first Mac next time I change my laptop). However, there are still a lot of Windows developers out there and Google will need to fix this problem if they want to attract lots and lots of developers.

But the biggest mistake Google need to fix is to make sure they don’t ever again run out of coffee mid-afternoon at an all day hackathon.

Reality in the world of programming language standards

July 16, 2014 No comments

I see a lot of steam being vented about the standards’ process as applied to programming languages and software related topics. Knowing something about how the process works might help people live calmer lives, at least once they have calmed down after reading this article. What I have to say applies to programming language standards because these are what I have been involved with, as a member and convener of various UK and international committtees, for 25+ years.

  • ISO and your national standards’ body don’t care about the standard you are talking about.

    These organizations are monopolies who are required to demonstrate that documented procedures are followed by all concerned. Can you think of any organizational structure that would create less incentives for those on the inside to listen to those on the outside?

    Yes, these organizations do sell standards but the sales model is all about the long tail and no peak, to speak of, of best sellers. The real business model for running a standards’ organization is to either charge members a fee (your country pays membership dues for each Standards Committee it wants a say in; if your country has not paid to be a member of ISO JTC 1/SC22 you have no say in programming language standards. ANSI in the US charges people for the right to volunteer their time to attend meetings to work on a standard) and/or rely on government subsidy.

    Not being cared about is actually a luxury that people who work in programming language standards should aspire to. The bureaucrats who work in standards hate us; here in the UK there has been at least one attempt to kill off work on programming language standards and I have heard of similar experiences in other countries. The problem is that the standards we produce don’t fit the mold that works for most other standards; programming language standards contain an order of magnitude more pages than the average standard (until recently there was a print run of new standards which then had to be stored until sold and the volume occupied by programming language standards was of note {so I’m reliably informed}), take longer to produce (i.e., more work for the bureaucrats) and all this cost is not justified by the sales figures (which are confidential and last time I saw them only just required me to take my shoes and socks off to count).

  • Standards are created by the volunteers who regularly turn up at meetings.

    It is only the enthusiasm of these volunteers that makes the process work. If you don’t turn up at meetings then what you think does not count (not quite true, something you write might influence the thinking of one of the worker bees who attends meetings resulting in wording in the standard).

    If you really are interested in a standard then become an active member of the committee responsible for it, at least the national one and if you have the time the international one

  • Committee documents can be made public.

    There are no rules preventing a standards committee putting its documents on a website for Joe public to download. The issue is finding somebody willing to do the work of hosting the website (the programming language world is lucky to have Keld Simonsen) and a willingness of committee members to be open about all their documents.

    Looking in from the outside it seems to me that many non-programming language committees want to maintain an aura of mystic and privileged access.

How to avoid being a victim of Brooks’ law

July 9, 2014 4 comments

The oft cited book The Mythical Man Month contains a statement that has become known as Brooks’ Law: “Adding manpower to a late software project makes it later”.

When people join a project they need to learn project specific information, this is information that can only be obtained from people already working on the project. Training up new staff (e.g., developers, documentation writers) reduces the amount of effort being directly invested in building the system; it is an investment in people whose benefit is the post-training productivity of those people adding their effort to the project.

Let’s assume that a newbie diverts from the project they are joining an amount of effort units of time, E_T, in training without contributing anything, and that this training/investment lasts for D_t units of time after which the trained person contributes an average of E_n of effort per unit of time until the project deadline. This investment in a newbie will cause the project to be delayed unless the following inequality holds:

(E_a D_t - E_T) + (E_a+E_n)(D_r-D_t) > E_a D_r” title=”(E_a D_t – E_T) + (E_a+E_n)(D_r-D_t) > E_a D_r”/> <img src=

where E_a is the average daily project effort available before a newbie joins and D_r is the number of units of time between the start of training and the delivery date/time.

This simplifies to:

D_r-D_t > {E_T}/{E_n}” title=”D_r-D_t > {E_T}/{E_n}”/> <img src=

an equation that makes the obvious point that as the deadline approaches the amount of time and effort spent training newbies needs to decrease if a worthwhile payback is to be achieved in the available time.

The quantity D_r-D_t is the amount of time remaining after the newbie finishes training (in practice this is rarely a well defined point in time, but let’s keep things simple) and is the only easily obtained information in the equation.

The effort contribution of the newbie, E_n, could be approximated using information on the effort contribution of other people doing a similar job on the project. At least it could be if data was available on what their effort contribution was, and we overlooked the possible 5:1 difference in performance found between software developers. In practice a newbie’s effort contribution ramps up from zero, perhaps even starting during the training period, to a relatively constant long term daily average. How long does it take for E_n to reach the long term average? I have no idea.

How much effort, E_T, goes into training a newbie? A very important factor will obviously be their existing level of expertise with the application domain, tools being used, coding skills, etc (pretty much everything was new, back in the day, for the project analysed by Brooks, so E_T was probably very high). There is also the somewhat nebulous but very important ability, or lack of, to pick things up quickly.

Could data on E_T be obtained by recording every encounter the newbie has with existing project members? This would certainly enable information on first order time interactions to be obtained, but it would not tell us anything about the knock on effects caused by the work of an existing project member being delayed because they were investing in the newbie.

If many people are being added to a project at the same time it is easy to imagine it grinding to a halt because of all the minor congestion that occurs within the network of dependencies that project progress is waiting on.

I have not been able to locate any applicable data relating to training on software development projects, but then this area is at the edge of what I know about. Pointers to data most welcome.

Compiler writing is for hedgehogs

July 2, 2014 No comments

It is said that a fox knows many thing, but the hedgehog knows one big thing. An insightful article by Venkatesh Rao (Venkat) showed how foxes and hedgehogs uniquely map to the two contrasting philosophical points of view of those having weak views that are strongly held (a fox) and those having strong views that are weakly held (a hedgehog).

Venkat observes that the many things the fox knows are acquired from multiple sources and that this disparate collection of knowledge is not connected together by any consistent set of core principles; the one big thing that a hedgehog knows consists of knowledge that is connected by a small set of consistent core principles.

An average developer’s knowledge of a language is very fox-like, i.e., it is culled from many particular instances with each snippet of knowledge being accompanied by the experience around which it was obtained. Back in the day, the ‘advanced’ courses I used to give to developers who had 2-3 years experience were really designed to show how the components of a language fitted together, i.e., to provide a structure to what they already knew about the language. Switching developers from an approach based on their experience of particular instances for each language feature to a rule based approach was often hard work, some developers seem to be naturally driven by itemized personal experiences.

Of necessity a compiler writer spends a lot of time studying one programming language (I’m excluding those who invent their own language as they write the compiler for it) and/or hardware cpu. This extended period of study, assuming the developer has sufficient cognitive capacity (the drop out rate is high), creates a heavily interconnected knowledge of the language in the compiler writer’s head, i.e., they understand one thing very deeply and have strong views created by the core rules they have created to organize this knowledge. These views are weakly held because experience shows that every now and again a major insight is achieved that changes the developer’s perspective completely.

This fox-like characteristic of developer language knowledge goes a long way towards explaining why religious language wars go on for so long and can be so ferocious. A fox is arguing from personal experience that is not based on a set of core principles; every point has to be argued because there is nothing connecting them, undermining one idea does not affect the status of the beliefs about anything else.

I am not arguing that being a fox is a good or a bad thing, and I am certainly not arguing that everybody should spend the huge amount of time needed to become a hedgehog (it is not a cost effective use of time). I am simply making an observation about a state of affairs, and one that is likely to continue because there are no incentives in trying to change things.

I think being a major contributor in the creation of any large and complex software system requires that somebody be or become a hedgehog.

I think that many software developers are foxes; of course to people looking in developers appear to be hedgehogs in the world of software.