Archive

Posts Tagged ‘Intel x86’

Open source: monoculture is more desirable than portability

March 8th, 2014 No comments

An oft repeated fable is that open source software is portable, all thanks to C and Unix. The reality is that open source lives in an environment that is evolving to become a monoculture that does not require portability, this is being driven by the law of the jungle.

First some background and history. Portability requires that source code have the same behavior on different platforms, or rather than programs built from that code have the same behavior, this requires that:

  • all compilers assign the same semantics to a given piece of code,
  • all operating systems include support the same set of libraries in the same way.

If you want portability across lots of compilers then Fortran has stood head and shoulders above the competition for decades. Now some C folk may point out that they have been compiling some large code base for decades, with few changes necessitated by compiler differences; yes this has been possible in certain niche markets where there is a dominant supplier who has a vested interest in not breaking customer code. C has a long history of widespread large variation in behavior of across compilers.

What about Cobol you ask? Cobol is all about data manipulation and unless you have data in the format expected by a Cobol program you have no need for that program. Nobody cares about portable Cobol programs unless they are also interested in portable data.

If you want portability across lots of operating systems the solution has always been to minimise the dependency on system/third-party library calls (to the extent of including source code for functionality often supported by an OS). The reason for minimising OS dependency is the huge variation in support for different libraries and a wide range of behaviors for supposedly the same functionality. But you say, Unix is an OS that did/does provide a common set of libraries that have the same behavior; no, this is history seen through rose tinted glasses as anyone who knows about the Unix wars will tell you.

In the last century, to experience ‘portability’ Unix developers had to live in a monoculture of either PDP-11s or Sun workstations.

Open source, as it existed in the 1970s, 80s and into the 90’s was Fortran code that ran on a surprisingly wide range of OS/cpu/compilers, along with a smattering of other languages. Back then there were not many software applications and when they did exist many were written in Fortran (Oracle being an early, lots of Fortran, example), this created a strong incentive for vendors to support a Fortran compiler that did things the same way as everybody else (which did not prevent them adding proprietary goodies to try and lure customers towards lock-in).

How did we get to today’s dominance of C and Unix? Easy, evolution at a rate that caused competitors to die out until there was a last man standing. That last man standing was gcc and Linux. The portability problem has been solved by removing the need to port code; it is compiled by the same compiler to run on the same cpu (Intel x86 family) to run under the same OS (Linux).

Of course some of today’s open source C is compiled using non-gcc compilers, but the percentage is small and specialised (a lot of the older code is portable because it used to exist in a multi-compiler/cpu/OS world and had to evolve into being portable). The gcc competitor, llvm, is working long and hard to ensure compatibility and somehow has to differentiate itself while being compatible, a tough fight for developer hearts and minds.

Differences in CPU characteristics are a big headache to any compiler writer wanting to support identical behavior across platforms; having a single cpu family as the market leader more or less solves this problem. ARM has become a major player in the CPU world, but it shares many developer visible characteristics with Intel x86 (e.g., 32-bit int, 64-bit long, pointer and ints are the same size and IEEE floating-point) and options are available for handling some of the other potential differences (e.g., right shift of signed integers).

The Unix wars have not gone away, they have moved to more far flung battlefields leaving behind some hard fought over common ground. Anybody who wants to see the scares left by these war only needs to look at the #ifs in system headers or the parameters selected inside .configure files.

Having everybody use the same compiler/cpu/OS saves having to make a huge time/money investment in making software portable, at least until the invention of photonic computers or the arrival of aliens (whose computers are unlikely to contain a CPU that shares Intel/ARM characteristics or have the same libraries as Linux).

C/Linux has not won in the sense that competitors have given up; in 20 years time the majority of open source in active use might be Javascript running inside a browser.

C compiler validation is 21 today!

September 1st, 2011 2 comments

Today, 1 September 2011, is the 21th anniversary of the first formally validated C compilers. The three ‘equal first’ validated compilers were the Model Implementation C Checker from Knowledge Software, Topspeed C from JPI (run by the people who created Turbo Pascal) and the INMOS C compiler (derived from the Norcroft C compiler written by Alan Mycroft+others, the author of the longest response document seen during the review of the C89 draft standard).

Back in the day the British Standards Institution testing group run by John Souter were the world leaders in compiler validation and were very proactive in adding support for a new language. NIST, the equivalent US body, did not offer such a service until a few years later. Those companies in a position to have their compilers validated (i.e., the compiler passed the validation suite) were pressing BSI to be first; the ‘who is first’ issue was resolved by giving all certificates the same date (the actual validation process of a person from BSI, Neil Martin now Director of Test in the Winterop Team at Microsoft, turning up to ‘witness’ the compiler passing the tests happened several weeks earlier).

Testing C compilers was different from other language compilers in that sufficient demand existed to support commercial production and maintenance of test suites (the production of validation suites for previous language compilers had been government funded). After a review of the available test suites BSI chose to use the Plum Hall suite; after a similar review NIST chose to use the Perennial suite (I got involved in trying to figure out for NIST how well this suite covered the requirements contained in the C Standard).

For a while C compiler validation was big business (as in big fish, very small pond). But the compiler validation market is dependent on there being lots of compilers, which requires market fragmentation and to a lesser extent lots of different OSs and hardware platforms (each needing a separate validation). The 1990s saw market consolidation, gcc becoming good enough for commercial use and a shift of developer mind share to C++. Dwindling revenue resulted in BSI’s compiler validation group being shut down after a few years and NIST’s followed in 1998.

Is compiler validation relevant today? When the first C Standard was published a lot of compilers in common use had some significant behavioural differences compared to what the Standard specified. Over time these compilers have either disappeared or been upgraded (a potential customer once asked me the benefits I saw in them licensing the Knowledge Software front end and the reply to one of my responses, “you can tell your customers that the compiler is standard’s compliant”, was that this was not a benefit as they had been claiming this for years). Improvements in Intel’s x86 processor also had a hand in improving compiler Standard’s conformance; the various memory models used by the x86 processor was a huge headache for compiler writers whose products often behaved very differently under different memory models; the arrival of the Pentium with its flat 32-bit address space meant this issue disappeared over time.

These days I suspect that the major compilers targeting platforms where portability is expected (portability is often not a big expectation in the embedded world) are sufficiently compatible that developers are willing to overlook small differences with the Standard. Differences in third party libraries, GUIs and other frameworks have been the big headache for many years now.

Would the ‘platform portability’ compilers, that’s probably gcc, Microsoft, products using EDG’s front end, and perhaps llvm in the coming years, pass the latest version of the PlumHall and Perennial suites?

  • The gcc team do not have access to either company’s suite. The gcc regression tests are a poor substitute for a proper compiler validation suite (even though they cost many thousands of dollars commercial compiler writers often buy both companies products because they are good value for money as a testing resource {the Fortran 78 validation suite source gives some idea of how much work is actually involved). I would expect gcc to fail some of the tests but have no idea how many or serious the failures would be.
  • Microsoft have said they don’t have plans to support C99 (it took a lot of prodding to get them interested in formally validating against C90).
  • I think the llvm team are in the same position as gcc, but perhaps somebody at Apple has access to one or more of the commercial suites (I don’t know).
  • EDG are into standard’s conformance and I would expect them to pass both suites.

The certificate is printed on high quality, slightly yellow paper; the template wording is in a subdued gray ink while the customer information is in a very bold black ink. I don’t know whether this is to make life difficult for counterfeiters, but I could not get any half decent photographs and the color scanner had to be switched to black&white.

Validation was good for one year and I saw no worthwhile benefit in paying BSI £5,000 to renew for another year. Few people knew about the one year rule and I did not enlighten them. In the Ada compiler market the one year rule was a major problem, but lets leave that for another time.

Model Implementation C validation certificate.