Archive

Archive for March, 2017

Happy 30th birthday to GCC

March 22nd, 2017 No comments

Thirty years ago today Richard Stallman announced the availability of a beta version of gcc on the mod.compilers newsgroup.

Everybody and his dog was writing C compilers in the late 1980s and early 1990s (a C compiler validation suite vendor once told me they had sold over 150 copies; a compiler vendor has to be serious to fork out around $10,000 for a validation suite). Did gcc become the dominant open source because one compiler would inevitably become dominant, or was there some collection of factors that gave gcc a significant advantage?

I think gcc’s market dominance was driven by two environmental factors, with some help from a technical compiler implementation decision.

The technical implementation decision was the use of RTL as the optimization+code generation strategy. Jack Davidson’s 1981 PhD thesis (and much later the LCC book) describe the gory details. The code generators for nearly every other C compiler was closely tied to the machine being targeted (because the implementers were focused on getting a job done, not producing a portable compiler system). Had they been so inclined Davidson and Christopher Fraser could have been the authors of the dominant C compiler.

The first environment factor was the creation of a support ecosystem around gcc. The glue that nourished this ecosystem was the money made writing code generators for the never ending supply of new cpus that companies were creating (that needed a C compiler). In the beginning Cygnus Solutions were the face of gcc+tools; Michael Tiemann, a bright affable young guy, once told me that he could not figure out why companies were throwing money at them and that perhaps it was because he was so tall. Richard Stallman was not the easiest person to get along with and was probably somebody you would try to avoid meeting (I don’t know if he has mellowed). If Cygnus had gone with a different compiler, they had created 175 host/target combinations by 1999, gcc would be as well-known today as Hurd.

Yes, people writing Masters and PhD thesis were using gcc as the scaffolding for their fancy new optimization techniques (e.g., here, here and here), but this work essentially played the role of an R&D group trying to figure out where effort ought to be invested writing production code.

Sun’s decision to unbundle the development environment (i.e., stop shipping a C compiler with every system) caused some developers to switch to another compiler, some choosing gcc.

The second environment factor was the huge leap in available memory on developer machines in the 1990s. Compiler vendors cannot ship compilers that do fancy optimization if developers don’t have computers with enough memory to hold the optimization information (many, many megabytes). Until developer machines contained lots of memory, a one-man band could build a compiler producing code that was essentially as good as everybody else. An open source market leader could not emerge until the man+dog compilers could be clearly seen to be inferior.

During the 1990s the amount of memory likely to be available in developers’ computers grew dramatically, allowing gcc to support more and more optimizations (donated by a myriad of people targeting some aspect of code generation that they found interesting). Code generation improved dramatically and man+dog compilers became obviously second/third rate.

Would things be different today if Linus Torvalds’ had not selected gcc? If Linus had chosen a compiler licensed under a more liberal license than copyleft, things might have turned out very differently. LLVM started life in 2003 and one of my predictions for 2009 was its demise in the next few years; I failed to see the importance of licensing to Apple (who essentially funded its development).

Eventually, success.

With success came new existential threats, in particular death by a thousand forks.

A serious fork occurred in 1997. Stallman was clogging up the works; fortunately he saw the writing on the wall and in 1999 stepped aside.

Money is what holds together the major development teams supporting gcc and llvm. What happens when customers wanting support for new back-ends dries up, what happens when major companies stop funding development? Do we start seeing adverts during compilation? Chris Lattner, the driving force behind llvm recently moved to Tesla; will it turn out that his continuing management was as integral to the continuing success of llvm as getting rid of Stallman was to the continuing success of gcc?

Will a single mainline version of gcc still be the dominant compiler in another 30 years time?

Time will tell.

Learning from some legal decisions

March 13th, 2017 No comments

The British and Irish Legal Information Institute provides “Access to Freely Available British and Irish Public Legal Information”. Searching the England and Wales High Court (Technology and Construction Court) Decisions throws up some interesting reading (when searching on software).

For those who have never seen a decent sized project go wrong from the inside, DE BEERS UK LIMITED (Formerly: THE DIAMOND TRADING COMPANY LIMITED) vs. ATOS ORIGIN IT SERVICES UK LIMITED provides a well written example. De Beers contracted Atos to write some software. The development of the software did not go well. Were the original requirements/spec underdone or were subsequent personnel not up to the job? Difficult to tell from the Decision, as is the reason Atos thought they had a chance of winning a court case.

SAP UK LIMITED vs. DIAGEO GREAT BRITAIN LIMITED was a licensing dispute, or more accurately an example of why it is important to check what your third-party software gets up to. Diageo had signed a licensing agreement with SAP and 5,800 Diageo users had used a Salesforce.com app which, unknown to them, made use of SAP. The end result was a bill for £55 million, which Diageo had not been expecting.

There are probably more interesting cases to learn from, but I am supposed to be writing a book in my ‘spare’ time.

Uncovering the undefined behaviors

March 7th, 2017 2 comments

I think that all programming languages contain some constructs that have undefined behavior.

The C Standard explicitly lists various constructs as having undefined behavior. It also specifies that: Undefined behavior is otherwise indicated in this International Standard by the words “undefined behavior” or by the omission of any explicit definition of behavior.; the second half of the sentence refers to what might be called implicit undefined behavior. Implicit undefined behavior can be subdivided into intentional and unintentional. Intentional undefined behavior applies to constructs that the committee considered and decided (and continues to decide) to say nothing about (e.g., question 19), while unintentional undefined behavior applies to constructs that the committee did not explicitly consider (when discovered, these often end up as defect reports, which are sometimes resolved as intentionally undefined behavior).

Fans of some languages claim that ‘their’ language does not contain any undefined behaviors.

Ada does not explicitly specify any construct as having undefined behavior, but it does specify that some constructs generate a bounded error; a rose by any other name…

I sometimes bump into language inventors claiming that ‘their’ language is fully specified, i.e., does not contain any undefined behaviors. My first question to them, about the behavior of division involving negative values, invariable requires me to explain that there are two possible ways of doing it (ignorance is bliss when fully specifying a language). The invariable answer is that the behavior is whatever the underlying implementation does (which is often written in C). In other words, they have imported all the undefined behaviors of the implementation language.

Follow-up question include: what is the order of expression evaluation (e.g., left-to-right, right-to-left, inside out…), what is the order of function argument evaluation (often driven by the direction of stack growth), what is the order of initialization and other order related questions that comes to mind. Their fully specified language quickly turns out to be a sham.

A recent post by John Regehr talks about Gödel’s incompleteness Theorem as a source of undefined behavior. My understanding is that the underlying argument is built on non-termination. How is it possible to tell the difference between non-termination and lasting longer than the age of the universe? In itself I don’t think this theorem is a source of undefined behavior; more enlightenment welcome.

C compilers of the 20th century running on Microsoft operating systems

March 2nd, 2017 No comments

There used to be a huge variety of C compilers available for sale under MS-DOS and later Microsoft Windows. A C compiler validation suite vendor once told me they had sold over 150 copies; a compiler vendor has to be serious to fork out around $10,000 for a validation suite (actually good value for money given the volume of tests in a commercial suite).

C compilers of the 20th century running on Microsoft operating systems would make a great specialist subject for a Mastermind contestant. The August 1983 issue of BYTE must be the go-to reference for C in the 1980s.

Here is my current list of compilers that were once and perhaps still are commercially available on Microsoft operating systems.

Aztec C: from Manx Software Systems.

Borland C: from Borland

cc65: …and on Github.

IBM PC C Compiler: from Lattice???

Lattice C:…

CI-C86: from Computer Innovations.

CSI-C:…

DeSmet C:…

Digital Research C: Was this ever sold on a Microsoft OS?

Eco-C and Eco-C88 C:…

LCC: sold as a book in the 20th century, but its Microsoft OS implementations, such as lcc-win (with over 2 million copies distributed) and Pelles C, are really 21st century compilers.

Mark Williams C compiler: A US company having an entry in the German Wikipedia ranked significantly higher by Google than its English Wikipedia page shows that this compiler was a big success on the Atari ST (very popular in Germany) but not DOS/Windows.

MetaWare High C:…

Microsoft C: The compiler that nobody got fired for buying. Vendors had to try hard generate worse code than this compiler (which some achieved, i.e., MIX) and also very hard to provide better the runtime support (which nobody ever could). Version 2 of Microsoft C was actually the Lattice C compiler.

MIX C from Mix Software

NDP C:…

Supersoft C:…

TopSpeed C: from Jensen & Partners International.

Watcom C: open sourced as Open Watcom

Wizard C: from Bob Jervis who sold (licensed???) it to Borland, where it became Turbo C.

Zorland C, Zortech C: from Walter Bright and my compiler of choice for several years.

If you know of a compiler that is missing from this list, or have better information, please let me know in the comments. Hopefully I will start to remember more about long forgotten C compilers.

Tags: