Archive for December, 2014

Z and Zermelo–Fraenkel

December 25th, 2014 No comments

Z is for Z and Zermelo–Fraenkel.

Z, pronounced zed, or Z notation to give it its full name, replaced VDM as the fashionable formal specification language before itself falling out of fashion. Gallina is in fashion today, but you are more likely to have heard of the application software built around it, Coq.

Zermelo–Fraenkel set theory with the axiom of choice is the brand name assembly language of mathematics (there are a number of less well known alternatives). Your author is not aware of any large program written in this assembly language (Principia Mathematica used symbolic logic and over 300 pages to prove that 1+1=2).

While many mathematical theorems can be derived from the axioms contained in Zermelo–Fraenkel, mathematicians consider some to be a lot more interesting than others (just like developers consider some programs to be more interesting than others). There have been various attempts to automate the process of discovering ‘interesting’ theorems, with Eurisko being the most famous and controversial example.

Things to read

Automated Reasoning: Introduction and Applications by Larry Wos, Ross Overbeek, Rusty Lusk, and Jim Boyle. A gentle introduction to automated reasoning based on Otter.

… and it is done.

Readers may have noticed the sparse coverage of languages created between the mid-1990s and 2010. This was the programming language Dark ages, when everybody and his laptop was busy reinventing existing languages at Internet speed; your author willfully tried to ignore new languages during this time. The focus for reinvention has now shifted to web frameworks and interesting new ideas are easier to spot amongst the noise.

Merry Christmas and a happy New Year to you all.


XSLT, yacc and Yorick

December 24th, 2014 No comments

X and Y are for XSLT, yacc and Yorick.

XSLT is the tree climbing Kangaroo of the programming language world. Eating your own dog food is good practice for implementors, but users should not be forced to endure it. Anyway, people only use XML, rather than JSON, to increase the size of their files so they can claim to be working with big data.

yacc grammars don’t qualify as programming languages because they are not Turing complete (because yacc requires its input to be a deterministic context-free grammar, a subset of context-free grammars, that can be implemented using a deterministic pushdown automaton; restructering a context-free grammar into a deterministic context-free grammar is the part of using yacc that mangles the mind the most). While Bison supports GLR parsing, this only extends its input to all context-free grammars; the same notation is used to specify grammars and this does not support the specification of languages such as: L = a^n b^n c^n, where n ge 1 and a^n denotes a repeated n times, which requires a context-free grammar. So the yacc grammar notation is not capable of expressing all computations.

Yorick is designed for processing large amounts of scientific data and is a lot like R, with slightly different syntax. Use of Yorick seems to have died out in the last few years. What happened, did its users discover R? I would certainly recommend a switch to R, but I would also suggest Yorick as a subject of study for the maintainers of R; there are language ideas to be mulled over, e.g., global("blah") to declare blah as being a global variable within the function containing the call.

Things to read

The Theory of Parsing, Translation, and Compiling, Vol. 1, Parsing by Alfred V. Aho and Jeffrey D. Ullman. Warning, reading this book will make your head hurt.

Introduction to Automata Theory, Languages, and Computation by John Hopcroft and Jeffrey D. Ullman. Has a great chapter on Turing machines. If you really want your head to hurt, then get the book above rather than the first edition of this book.


Wolfram and Whitespace

December 23rd, 2014 No comments

W is for Wolfram and Whitespace.

Wolfram, or the Wolfram language as the marketing people would want me to call it, is really Mathematica rebranded with its master‘s name.

Mathematica Wolfram (the language) is currently the dominant player in its niche market, where is its future? In the short term its future is in the hands of Wolfram (the person, company information suggests family ownership), with possible short term decisions and their long term consequences including:

  • keeping its sole implementation proprietary: in this case its fate depends on the commercial success of Wolfram Research. The main competition appears to be various Open source systems (e.g., Sage) and while these might not be as good, history shows that free software only needs to be good-enough to soak up a market share that makes the sale of commercial offerings uneconomic. How long before the drip-drip improvement of Open source makes it good-enough? Five, ten years?
  • selling to a larger company: Wolfram strikes me as a lifestyle business, so I don’t see this happening. How much might the company be worth? Wolfram Research Europe Ltd was valued at £3,138,002 in its 2013 financial filing and the Delaware incorporation had an annual tax assessment of $1,575.00 in 2013. No doubt other numbers are available and somebody who knows about these things could estimate a ballpark figure, but I am not that person,
  • releasing the code as Open source: if done now Wolfram (top dog that it is in quality and market share) would be the language for doing computer mathematics for ever more. However, few people have been willing to let go to this extent unless backed into a corner. Perhaps there is a subset that can be made Open source, enough to mow the lawn of competing systems but not so much that corporate customers will stop paying for support and what they believe are useful add-ons.

Whitespace is an invisible language (its written form is based on the characters space, tab and linefeed). The execution of this invisible language obviously requires a matching invisible virtual machine such as sed.

Programs and the means to execute them are all around us.


VHDL, Visual-Basic and VDM

December 22nd, 2014 No comments

V is for VHDL, Visual-Basic and VDM.

VHDL and Verilog are the leading hardware description languages. My only real hands on experience with digital hardware is soldering together my first computer (buying it in kit+component form saved me a month’s salary), but given the amount of dataflow programming involved I appreciate that programming hardware is likely to be very hard (but not as hard as programming Quantum computers).

The problem with general purpose cpus is that they never seem to have just the right instructions needed to generate really fast compact code for a particular application. The ideal solution is for the compiler to automatically figure out what instructions are the best fit for a given application and then to include the appropriate processor specification in the generated executable (program initialization starts with configuring the host cpu with the appropriate architecture+instructions to execute the generated code). Hardware is available to make this a reality, but the necessary software support is still being worked on (there is also the tiresome economic issue of mass produced cpus almost always being cheaper). At the moment application specific processors tend to be built into application specific computers.

Anyway, real programmers don’t write programs that make LEDs flash on a Raspberry Pi, real programmers use single-board computers that support personalization of the instruction set. If this future has arrived for you, please let me know how I can be included in the distribution.

Visual-Basic shares half of its name and use of the keyword Dim with BASIC. It was clever marketing to give a new language a name that contained the name of a language with a reputation for being easy to learn and use. I bet the developers involved hated this association.

VDM was for many years THE formal method; complete definitions of industrial strength languages (e.g., PL/1, Modula-2 and CHILL) were written using it. These days it seems to be supporting a small community by helping people model industrial systems, however it has fallen out of fashion in the programming language world.

Back in the day there was lots of talk of tools using the available VDM language specifications to check source code (my interest was in checking the consistency of the specifications). In practice tooling did not get much further than typesetting the printed VDM.

Back when I thought that formal methods had a future, I was more of a fan of denotational semantics than VDM. The CHILL VDM specification has at least one reported fault (I cannot tell you where it is because I cannot find my copy); mine was the first reported and I never heard if anybody else ever reported a fault.

Things to read

Computer Architecture: A Quantitative Approach (second edition) by John L. Hennessy and David A. Patterson. Later editions become more narrowly focused on current hardware.

The Denotational Description of Programming Languages by Michael J. C. Gordon is a good short introduction, while Denotational Semantics by David A. Schmidt is longer.



December 21st, 2014 No comments

U is for UNCOL and UML.

UNCOL is the compiler intermediate language that gets named when people need a historical starting point to refer to. This kind of intermediate language is one that contains all the information needed for a compiler front-end to communicate a representation of the original source code to an independent compiler back-end that generates machine code (the intermediate languages used for internal communication within a compiler are often just tree data structures that also make use of information held in other data structures, i.e., they are not points where a clean cut can be made to insert independent, self contained, modules). The rationale for a common intermediate language is saving time and money by reducing the number of compilers that need to be created from N_L * N_M to N_L + N_M, where N_L is the number of languages and N_M the number of target machines.

Generating high quality code requires having access to as much information as possible about the original source, which means the intermediate language split-point needs to occur as soon after source language specific processing as possible (otherwise all kinds of subtle target machine architecture assumptions get embedded in the structure of the intermediate code, which can make life difficult when the target makes different assumptions). ANDF is one of the few implementations that makes the split-point early (potential customers were not happy that it would be relatively easy to reverse engineer the original source).

Traditionally many more back-end got written than front-ends and the decision on where to make the intermediate language split-point was driven by the desire to reduce the time and effort needed to write a back-end, i.e., it was as close to actually pumping out machine code as possible. The quality of the generated code is not as good as it might be, but a new compiler can be shipped much quicker and more cheaply. Almost every retargetable compiler has made this design choice.

These days with more new languages appearing than new processors there would appear to be advantages in shifting the intermediate language split-point closer to the front-end. In practice compiler vendors get most of their income from back-end work (e.g., vendors wanting support for their new processor) and income from new front-ends is very rare, i.e., there is no incentive to change the status-quo.

UML is not regarded as a serious language by many developers. Is this because programs look like pretty pictures, something that managers might use, or UML being used for very non-developer problems like business processes? I don’t know. UML seems to be growing in popularity and have an increasing number of support tools. Perhaps one day I will use it for real and be able to say something of interest.

Things to read

A Code Generation Interface for ANSI C by C. W. Fraser and D. R. Hanson.

Code selection through object code optimization by J. W. Davidson and C. W. Fraser.


TECO, Tcl/Tkl and TeX

December 20th, 2014 1 comment

T is for TECO, Tcl/Tkl and TeX.

TECO was present during creation and begat Emacs. Despite living on TECO’s home turf, as an undergraduate, my lack of beard and disdain for Dungeons & Dragons stopped me becoming a member of its inner sanctum.

Tcl/Tkl is practical combination of scripting language+GUI toolkit for adding a usable (if sometimes somewhat fragile) GUI wrapper around one or more command line programs. Tcl is one of those languages that gets the job done without anyone making a big fuss about it.

TeX is Donald Knuth’s original typesetting system, however the language built on top of it, LaTeX, is probably more well known these days. While creating TeX Knuth proposed an approach to combing code and documentation together and gave it the catchy name literate programming; unfortunately this approach was not strangled at birth and because of Knuth’s reputation and the catchy name it keeps regrowing heads that need to be cut-off. The literate programming approach requires code, comments and formatting instructions be jumbled together in a horrendous mess, which is then feed through a process that can produce visually beautiful output (if enough effort is put into inserting the appropriate incantations into the horrendous mess). When writing something that will primarily be read at the beautiful output stage it is worth investing in the horrendous mess stage, but most code is used in its native, horrendous mess, form. Knuth has the mathematician’s habit of using single letter identifiers and love of ‘neat’ tricks, he would be a disaster on any software engineering project where he could not be left alone in a room producing code that nobody ever needs to touch.


Snobol 4, Simula 67, Smalltalk, sed, SQL, SETL, Scratch and Spreadsheet

December 19th, 2014 No comments

S is for Snobol 4, Simula 67, Smalltalk, sed, SQL, SETL, Scratch and Spreadsheet.

Snobol 4 was the go to language for text generation and pattern matching before PERL/Python became widespread in the late 1990s (if your PERL/Python code is too slow try Snobol, compilers generating machine code have been written for it since the 1970s). It is a boutique language that entrances many people who encounter the Green book (your author fell under its spell as an undergraduate).

Simula 67 was the first object oriented language. However, 30 years went by before object-oriented programming could be said to have entered general use (some people would claim that this has still not happened). Yes, lots of developers were using C++ compilers in the late 1980s, but only because they wanted to look trendy and be able to list the language on their CV. During the late 1980s and early 1990s I was heavily involved with the static analysis of C using an in-house tool and found that a lot of C++ source could be handled by adding a few extensions to the tool, such as handling new (rather than malloc) and allowing class to be used where struct should have appeared. It was not until the mid-90s that parsing C++ source with an extended C parser became problematic on a regular basis (and then often only because of the use of iostreams).

Back in the day it was said that only Scandinavians understood OO programming. I think the real reason was that only Scandinavians understood the English used in the book written by the language designers: “Simula BEGIN” by Birtwistle, Dahl, Myrhaug and Nygaard (there is a rave review on Amazon, I wonder if the reviewer is Scandinavian).

Smalltalk is invariably the language of choice for object-oriented purists. I know of several development groups in industry that choose to use Smalltalk because they wanted to really learn how to do things the OO ‘way’ (Smalltalk is all objects, its users cannot do anything in a non-OO way); I don’t know what those subsequently responsible for maintaining of the code thought about this rationale.

The ‘launch’ of Smalltalk in the UK by Goldberg and crew happened before Apple’s Macintosh made Xerox PARC famous, and the talk was all about message passing being the future and everybody being able to modify any behavior of their operating environment (we were shown Smalltalk’s flexibility by a speaker who changed the definition of an operation that defined how the borders of windows were displayed; they agreed that this was potentially dangerous but that users soon learned by their mistakes). The launch of the Macintosh changed the whole PARC narrative.

Thinking back, the only thing that stopped the Smalltalk team inventing Open Source was the fact they thought it so obvious to ship the source of everything that it was not worth mentioning.

sed gets a mention as the most unlikely language used to implement a Turing machine.

SQL is invisible to developer who don’t use it because publishing SQL source is generally pointless unless the appropriate data is also available. There is a lot of SQL lurking (as text strings) within the source of programs written in other languages. If you want to see some SQL then check out the Javascript embedded in a lot of the web pages you read.

SETL is a Cambrian explosion language which has had a small, but influential, user base.

Scratch suffers from being thought of as a language for children. If touch screens had been around when people first started inventing programming languages I’m sure things would be very different today, where most developers think that real programs are written in lines of text.

Spreadsheet languages have for a long time been a neglected topic from the perspective of tools to find faults and programming language research. Things are starting to change in industry and hopefully the work of keen academics will be appreciated in academia.

Things to read

The SNOBOL4 Programming Language by Ralph E. Griswold, J. F. Poage, and I. P. Polonsky.

Every language has its blind spot, the techniques that should never be written using that language. In the case of SQL it is tree data structures: “Joe Celko’s Trees and Hierarchies in SQL for Smarties” by Joe Celko.


Ratfor, R, RUNOFF, RPG and Ruby

December 18th, 2014 No comments

R is for Ratfor, R, RUNOFF, RPG and Ruby

Ratfor is a structured form of Fortran from the days when structured programming was the in-thing and Fortran did not have much of it (lots got added in later revisions). I think its success came from allowing users to claim a degree of respectability that Fortran did not have, and which Fortran did not appear to gain when structure constructs were added to it (but then all successful languages are treated with suspicion in some circles).

The maintainers of R provide a valuable lesson on issues that are not important for a language to be widely used, such as the following (which I’m sure many of those involved with languages incorrectly think are important):

  • Technical language details are not important (e.g., functional, imperative, declarative, object-oriented, etc); as far as I can tell the language has hardly changed over the years, yet users are not bothered. What is important is how easily the language can be used to solve its users’ problems. There are umpteen different ways a language can be designed around R’s very useful ability to operate on vectors as a single unit or to dynamically create data-frames, it does not make much difference how things are done as long as they work.
  • Runtime efficiency is often not important; a look at the source of the R runtime system suggests that there are lots of savings to be had (maybe a factor of two). Users are usually a lot more willing to wait for a running program to complete than struggle with getting the program running in the first place; the R maintainers have concentrated on the tuning the usability of the ecosystem (intentionally or not, I don’t know). Also, R appears to be like Cobol in that the libraries are the best place to invest in performance optimization. I say ‘appears’ because I have noticed a growing number of R libraries completely written in R, rather than being a wrapper to code in C or Fortran; perhaps the efficiency of the runtime system is becoming an important issue.

    Most programs don’t use a lot of cpu resources, this was true back when I was using 8-bit cpus running at 4MHz and is even more true today. I used to sell add-on tools to make code run faster and it was surprising how often developers had no idea how long their code took to run, they just felt it was not fast enough; I was happy to go along with these feelings (if the developers could recite timing data a sale was virtually guaranteed).

plot is an unsung hero in R’s success, taking whatever is thrown at it and often producing something useful (if somewhat workman-like visually).

RUNOFF is the granddaddy of text processing systems such as *roff and LaTeX. RUNOFF will do what you tell it to do (groff is a modern descendant), while LaTeX will listen to what you have to say before deciding what to do. A child of RUNOFF shows that visual elegance can be achieved using simple means (maintainers of R’s plot function take note). Businesses used to buy computers and expensive printers so they could use this language, or one of its immediate descendants.

RPG must be the most widely used proprietary programming language ever.

Is Ruby’s current success all down to the success of one web application framework written in it? In its early years C claim to fame was as the language used to write Unix (I know people who gave this as the reason for choosing to use C). We will have to wait and see if Ruby has a life outside of one framework.

Things to read

“The R Book” by Michael J. Crawley.



December 17th, 2014 No comments

Q is for QCL.

Quantum computers are still so new that creators of programs languages for them are still including the word quantum in their name. I shouldn’t complain because without them today’s post would be about existing languages with the word quick tacked on the front.

Conventional modern computers contain transistors and while a knowledge of quantum mechanics is required for a detailed understand of how these work, the layman’s explanation can be given using classical terms and does not involve any apparently weird behavior (tunnelling can be skipped as a technical detail). Quantum computers make direct use of superposition and entanglement, both of which are at the very weird end of quantum mechanical effects.

Conventional computers work by toggling the state of transistors between 0/1 (a bit), Quantum computers work by performing unitary transforms on Qubits.

Unitary transforms are reversible (by the definition of the transform). This means that feeding the output value from a Quantum computer back through the output will result in the value of the original input appearing at the input. Sounds interesting until you realise that this places a couple of major restriction on the structure of a quantum program:

  • the number of input bits must equal the number of output bits (its ok for some of these bits to be don’t-care bits from the point of view of interpreting them to give an external meaning),
  • variables cannot be overwritten (this would destroy information and is not reversible),
  • variables cannot be copied (there is no unitary transform available to do this).

Developers coding for parallel processors need to stop complaining, their life could be a lot harder. In fact figuring out how to get a quantum computer to do anything useful is so hard that people have taken to using genetic algorithms to obtain useful practical results (it was a search based software engineering workshop where I first learned about the realities of programming a Quantum computer).

These programming constraints actually make one feel relieved that the largest available Quantum computer only contains 128 Qbits that need to be used up.

It is possible to build a transistor-based reversible computer using Toffoli-gates; it has been shown that such a computer would be capable of performing any computation that a Turing machine could perform. An important theoretical result that shows that Quantum computers are at least as powerful as conventional computers.

Building a Quantum computer is very hard. The basic approach is to prepare a superposed state (perhaps using ion traps or microwaves), then apply unitary transformations in a way that takes advantage of quantum parallelism, followed by gathering up the result and making a measurement. Quantum computers are like ASICs (i.e., they need to be configured to perform one kind of calculation) and not general purpose programmable computers.

Quantum parallelism is what gives Quantum computers the potential for having a huge performance advantage over today’s computers. The poster child for this performance advantage is Shor’s algorithm (which finds the prime factors of very large numbers, such as those used in public key encryption).

Quantum computers are very delicate beasts that are destroyed by interaction with the world around them (e.g., being hit by a passing atom lurking within the near perfect vacuum in which they operate, or a stray photon bouncing around the cavity in which the computer is held).

I have a great idea for hackers wanting to stop a government agency cracking their communication keys using a Quantum computer running Shor’s algorithm. What the hackers need to do is convince the agency that some very large prime numbers are being used as keys, the agency will feed these numbers to the Quantum computer which it will be expecting a solution that contains two smaller numbers and the failure to obtain this expected result will lead the control software to think that the Quantum computer has developed a fault and shut it down (this idea has a few gaps that need filling in, not least is generating the very large prime numbers); Hollywood, I am available as a script adviser.

The commercially available Quantum computer gets around the delicate nature problem by accepting that there will be noise in the output and targeting problems whose solution has some degree of immunity to noise.

What about quantum computer languages? A variety have been created (all work by using simulators running on commodity hardware and including support for some very un-unitary transform like operations): QCL is based on a C-like syntax and supports various quantum operations, and of course the function folk know a good bandwagon when they see it.


Pascal, Prolog, Perl, Python and PL/1

December 16th, 2014 No comments

P is for Pascal, Prolog, Perl, Python and PL/1.

Pascal was the love of my life when young (what else would cause anybody to think that a C to Pascal translator was a good idea). The language built up a substantial user base, was widely taught in universities and was widely considered to be a good language for writing robust software, yet by the end of the 1980s its market share had crashed. What happened?

I think Pascal lost out to C because it generates a lot more upfront friction. Stronger type checking, runtime array bounds checking; its just unreasonable to expect developers to deal with this stuff and deliver on time. Discovering faults in code is what customers are for. Besides, any language that forces developers to use vendor extensions because the language does not officially support separate compilation, bitwise operations and other ‘useful’ stuff just does not deserve to be taken seriously.

Languages like Pascal will not become mainstream until managers stop allowing developers to put private short term interests before the longer term reliability and cost interests of groups.

Prolog is a member of the set of languages that are very different from all the other languages. looks so easy to use, when reading examples in books. In practice the Experience of using almost any existing languages (i.e., those based on control flow) means a whole way of doing things needs to be unlearned; declarative languages don’t have control flow, they work by developers expressing goals + relationships and leaving it to the implementation runtime to figure out how to reach the appropriate goals.

Perl shares with Lisp the ability to attract beards and Dungeons&Dragon enthusiasts, but differs in not repelling non-clever people (which is a serious design flaw given the amount of Internet infrastructure based on it). At times it feels like you can reach out and touch the Perl community, why is this so strong in Perl? Perhaps mastering the language syntax results in ascending to an astral plane where it is only possible to communicate with other initiates. Here is a self-documenting group of people just waiting to be studied for social anthropology PhD.

Perl 6 continues to provide a useful reminder that while developers hate language warts from early versions, they hate it even more when the warts are removed and the ability of existing code to continue to work becomes questionable.

Python is the language I use to work with the young generation of developers.

PL/1 was the first corporate language and because computer languages were still very new back then they could not adopt something a couple of kids had hacked together based on existing languages. Instead, there were committee meetings, academic work on a formal definition and magazine profiles by Very Important People.

In an attempt to be user friendly PL/1 was designed to try very hard to implicitly convert values to whatever type was required. Given its built-in support for Cobol business data-types and Fortran engineering data-types (in a bid to attract users from both those languages), plus other data-types that neither language had decided to include (e.g., bit strings), developers regularly encountered a level of surprising behavior that modern languages rarely achieve.

Things to read

Algorithms + Data Structures = Programs by Niklaus Wirth was a revelation in its day and is now basic introductory software engineering. Unless resources are very limited, developers use constructs that encapsulate the details illustrated in this book, such as iterators.