Archive

Archive for March, 2017

Gentleman scientists in software engineering

March 31, 2017 No comments

The Royal Society was formed in 1660 as a “College for the Promoting of Physico-Mathematical Experimental Learning”. A lot of very important research was done by members of this society, who were independently wealthy or held a university post.

For a while now, I have thought that the only way software engineering is going to advance to become a real engineering/scientific discipline is via gentleman scientists (unless industry really does need more clueless button pushers).

I was talking at the LondonR meeting on Tuesday (slides) and got chatting with familiar faces from hackathons. It seems that they had also had ideas for researching particular problems in software engineering, and liked the idea of a group of Gentleman scientists.

The problem we have is that none of us wants to do the organizing (a common problem). We must be able to do better than meeting in a pub.

I think the main qualifications for being a member of the group of Gentleman scientists for the “Promoting of Software Experimental Learning” would be something like:

  • enjoying the pleasure of gaining knowledge about how the world works, i.e., no flights of fancy,
  • interested in finding answers to questions whose answers are not yet known, i.e., doing real research, not personal learning,
  • have the funds to support what you do, i.e., you want funding, you find it,
  • being proficient in a necessary skill, i.e., you cannot be a beginner in all the required skills.

If the Danish Gentlemen scientists can send rockets into space, I’m sure the inhabitants of London and surrounds (no nationality restrictions here) can make major discoveries in software engineering (nobody has really found any yet, they are all waiting to be found).

Doing software research is not expensive, in monetary terms. It requires that those involved know something about real-life software issues and have the time and inclination to research possible solutions. People in industry are ideally placed to do the research. There are academic research groups doing interesting work in this area (they are in the minority). There are no groups we could join that are within easy traveling distance of London’ish based people (I would claim none in the UK).

The rationale for having a group of like-minded people meeting together include: it provides a structure and focus, sharing ideas is interesting and helps refine them, it’s an enjoyable night out, and a network is good for sharing/finding resources.

What might be the outputs of this group/network/society/asylum? Blog posts, talks, reports, books: the intent is to produce stuff that practicing software developers will find useful.

When the Royal Society started, Latin was the language of scholars. It’s motto ‘Nullius in verba’ catches the sentiment, but ‘take nobody’s word for it’ does not sound catchy. Something to work on.

I will keep readers posted on any progress (e.g., finding a venue and organizing a night). If any readers knows of an existing group like this, please let me know (not looking to build an empire).

Economics chapter added to “Empirical software engineering using R”

March 26, 2017 No comments

The Economics chapter of my Empirical software engineering book has been added to the draft pdf (download here).

This is a slim chapter, it might grow a bit, but I suspect not by a huge amount. Reasons include lots of interesting data being confidential and me not having spent a lot of time on this topic over the years (so my stash of accumulated data is tiny). Also, a significant chunk of the economics data I have is used to discuss issues in the Ecosystems and Projects chapters, perhaps some of this material will migrate back once these chapters are finalized.

You might argue that Economics is more important than Human cognitive characteristics and should have appeared before it (in chapter order). I would argue that hedonism by those involved in producing software is the important factor that pushes (financial) economics into second place (still waiting for data to argue my case in print).

Some of the cognitive characteristics data I have been waiting for arrived, and has been added to this chapter (some still to be added).

As always, if you know of any interesting software engineering data, please tell me.

I am after a front cover. A woodcut of alchemists concocting a potion appeals, perhaps with various software references discretely included, or astronomy related (the obvious candidate has already been used). The related modern stuff I have seen does not appeal. Suggestions welcome.

Ecosystems next.

Categories: Uncategorized Tags: , ,

Happy 30th birthday to GCC

March 22, 2017 No comments

Thirty years ago today Richard Stallman announced the availability of a beta version of gcc on the mod.compilers newsgroup.

Everybody and his dog was writing C compilers in the late 1980s and early 1990s (a C compiler validation suite vendor once told me they had sold over 150 copies; a compiler vendor has to be serious to fork out around $10,000 for a validation suite). Did gcc become the dominant open source because one compiler would inevitably become dominant, or was there some collection of factors that gave gcc a significant advantage?

I think gcc’s market dominance was driven by two environmental factors, with some help from a technical compiler implementation decision.

The technical implementation decision was the use of RTL as the optimization+code generation strategy. Jack Davidson’s 1981 PhD thesis (and much later the LCC book) describe the gory details. The code generators for nearly every other C compiler was closely tied to the machine being targeted (because the implementers were focused on getting a job done, not producing a portable compiler system). Had they been so inclined Davidson and Christopher Fraser could have been the authors of the dominant C compiler.

The first environment factor was the creation of a support ecosystem around gcc. The glue that nourished this ecosystem was the money made writing code generators for the never ending supply of new cpus that companies were creating (that needed a C compiler). In the beginning Cygnus Solutions were the face of gcc+tools; Michael Tiemann, a bright affable young guy, once told me that he could not figure out why companies were throwing money at them and that perhaps it was because he was so tall. Richard Stallman was not the easiest person to get along with and was probably somebody you would try to avoid meeting (I don’t know if he has mellowed). If Cygnus had gone with a different compiler, they had created 175 host/target combinations by 1999, gcc would be as well-known today as Hurd.

Yes, people writing Masters and PhD thesis were using gcc as the scaffolding for their fancy new optimization techniques (e.g., here, here and here), but this work essentially played the role of an R&D group trying to figure out where effort ought to be invested writing production code.

Sun’s decision to unbundle the development environment (i.e., stop shipping a C compiler with every system) caused some developers to switch to another compiler, some choosing gcc.

The second environment factor was the huge leap in available memory on developer machines in the 1990s. Compiler vendors cannot ship compilers that do fancy optimization if developers don’t have computers with enough memory to hold the optimization information (many, many megabytes). Until developer machines contained lots of memory, a one-man band could build a compiler producing code that was essentially as good as everybody else. An open source market leader could not emerge until the man+dog compilers could be clearly seen to be inferior.

During the 1990s the amount of memory likely to be available in developers’ computers grew dramatically, allowing gcc to support more and more optimizations (donated by a myriad of people targeting some aspect of code generation that they found interesting). Code generation improved dramatically and man+dog compilers became obviously second/third rate.

Would things be different today if Linus Torvalds’ had not selected gcc? If Linus had chosen a compiler licensed under a more liberal license than copyleft, things might have turned out very differently. LLVM started life in 2003 and one of my predictions for 2009 was its demise in the next few years; I failed to see the importance of licensing to Apple (who essentially funded its development).

Eventually, success.

With success came new existential threats, in particular death by a thousand forks.

A serious fork occurred in 1997. Stallman was clogging up the works; fortunately he saw the writing on the wall and in 1999 stepped aside.

Money is what holds together the major development teams supporting gcc and llvm. What happens when customers wanting support for new back-ends dries up, what happens when major companies stop funding development? Do we start seeing adverts during compilation? Chris Lattner, the driving force behind llvm recently moved to Tesla; will it turn out that his continuing management was as integral to the continuing success of llvm as getting rid of Stallman was to the continuing success of gcc?

Will a single mainline version of gcc still be the dominant compiler in another 30 years time?

Time will tell.

Learning from some legal decisions

March 13, 2017 No comments

The British and Irish Legal Information Institute provides “Access to Freely Available British and Irish Public Legal Information”. Searching the England and Wales High Court (Technology and Construction Court) Decisions throws up some interesting reading (when searching on software).

For those who have never seen a decent sized project go wrong from the inside, DE BEERS UK LIMITED (Formerly: THE DIAMOND TRADING COMPANY LIMITED) vs. ATOS ORIGIN IT SERVICES UK LIMITED provides a well written example. De Beers contracted Atos to write some software. The development of the software did not go well. Were the original requirements/spec underdone or were subsequent personnel not up to the job? Difficult to tell from the Decision, as is the reason Atos thought they had a chance of winning a court case.

SAP UK LIMITED vs. DIAGEO GREAT BRITAIN LIMITED was a licensing dispute, or more accurately an example of why it is important to check what your third-party software gets up to. Diageo had signed a licensing agreement with SAP and 5,800 Diageo users had used a Salesforce.com app which, unknown to them, made use of SAP. The end result was a bill for £55 million, which Diageo had not been expecting.

There are probably more interesting cases to learn from, but I am supposed to be writing a book in my ‘spare’ time.

Categories: Uncategorized Tags: ,

Uncovering the undefined behaviors

March 7, 2017 2 comments

I think that all programming languages contain some constructs that have undefined behavior.

The C Standard explicitly lists various constructs as having undefined behavior. It also specifies that: Undefined behavior is otherwise indicated in this International Standard by the words “undefined behavior” or by the omission of any explicit definition of behavior.; the second half of the sentence refers to what might be called implicit undefined behavior. Implicit undefined behavior can be subdivided into intentional and unintentional. Intentional undefined behavior applies to constructs that the committee considered and decided (and continues to decide) to say nothing about (e.g., question 19), while unintentional undefined behavior applies to constructs that the committee did not explicitly consider (when discovered, these often end up as defect reports, which are sometimes resolved as intentionally undefined behavior).

Fans of some languages claim that ‘their’ language does not contain any undefined behaviors.

Ada does not explicitly specify any construct as having undefined behavior, but it does specify that some constructs generate a bounded error; a rose by any other name…

I sometimes bump into language inventors claiming that ‘their’ language is fully specified, i.e., does not contain any undefined behaviors. My first question to them, about the behavior of division involving negative values, invariable requires me to explain that there are two possible ways of doing it (ignorance is bliss when fully specifying a language). The invariable answer is that the behavior is whatever the underlying implementation does (which is often written in C). In other words, they have imported all the undefined behaviors of the implementation language.

Follow-up question include: what is the order of expression evaluation (e.g., left-to-right, right-to-left, inside out…), what is the order of function argument evaluation (often driven by the direction of stack growth), what is the order of initialization and other order related questions that comes to mind. Their fully specified language quickly turns out to be a sham.

A recent post by John Regehr talks about Gödel’s incompleteness Theorem as a source of undefined behavior. My understanding is that the underlying argument is built on non-termination. How is it possible to tell the difference between non-termination and lasting longer than the age of the universe? In itself I don’t think this theorem is a source of undefined behavior; more enlightenment welcome.

C compilers of the 20th century running on Microsoft operating systems

March 2, 2017 No comments

There used to be a huge variety of C compilers available for sale under MS-DOS and later Microsoft Windows. A C compiler validation suite vendor once told me they had sold over 150 copies; a compiler vendor has to be serious to fork out around $10,000 for a validation suite (actually good value for money given the volume of tests in a commercial suite).

C compilers of the 20th century running on Microsoft operating systems would make a great specialist subject for a Mastermind contestant. The August 1983 issue of BYTE must be the go-to reference for C in the 1980s.

Here is my current list of compilers that were once and perhaps still are commercially available on Microsoft operating systems.

Aztec C: from Manx Software Systems.

Borland C: from Borland

cc65: …and on Github.

IBM PC C Compiler: from Lattice???

Lattice C:…

CI-C86: from Computer Innovations.

CSI-C:…

DeSmet C:…

Digital Research C: Was this ever sold on a Microsoft OS?

Eco-C and Eco-C88 C:…

LCC: sold as a book in the 20th century, but its Microsoft OS implementations, such as lcc-win (with over 2 million copies distributed) and Pelles C, are really 21st century compilers.

Mark Williams C compiler: A US company having an entry in the German Wikipedia ranked significantly higher by Google than its English Wikipedia page shows that this compiler was a big success on the Atari ST (very popular in Germany) but not DOS/Windows.

MetaWare High C:…

Microsoft C: The compiler that nobody got fired for buying. Vendors had to try hard generate worse code than this compiler (which some achieved, i.e., MIX) and also very hard to provide better the runtime support (which nobody ever could). Version 2 of Microsoft C was actually the Lattice C compiler.

MIX C from Mix Software

NDP C:…

Supersoft C:…

TopSpeed C: from Jensen & Partners International.

Watcom C: open sourced as Open Watcom

Wizard C: from Bob Jervis who sold (licensed???) it to Borland, where it became Turbo C.

Zorland C, Zortech C: from Walter Bright and my compiler of choice for several years.

If you know of a compiler that is missing from this list, or have better information, please let me know in the comments. Hopefully I will start to remember more about long forgotten C compilers.

Categories: Uncategorized Tags: