July 24, 2014 Derek Jones 4 comments

IST/5, the British Standards’ committee responsible for programming languages in the UK, has a new(ish) committee secretary and like all people in a new role wants to see a vision of the future; IST/5 members have been emailed asking us what we see happening in the programming language standards’ world over the next 12 months.

The answer is, off course, that the next 12 months in programming language standards is very likely to be the same as the previous 12 months and the previous 12 before that. Programming language standards move slowly, you don’t want existing code broken by new features and it would be a huge waste of resources creating a standard for every popular today/forgotten tomorrow language.

While true the above is probably not a good answer to give within an organization that knows its business intrinsically works this way, but pines for others to see it as doing dynamic, relevant, even trendy things. What could I say that sounded plausible and new? Big data was the obvious bandwagon waiting to be jumped on and there is no standard for R, so I suggested that work on this exciting new language might start in the next 12 months.

I am not proposing that anybody start work on an ISO standard for R, in fact at the moment I think it would be a bad idea; the purpose of suggesting the possibility is to create some believable buzz to suggest to those sitting on the committees above IST/5 that we have our finger on the pulse of world events.

The purpose of a standard is to create agreement around one way of doing things and thus save lots of time/money that would otherwise be wasted on training/tools to handle multiple language dialects. One language for which this worked very well is C, for which there were 100+ incompatible compilers in the early 1980s (it was a nightmare); with the publication of the C Standard users finally had a benchmark that they could require their suppliers to meet (it took 4-5 years for the major suppliers to get there).

R is not suffering from a proliferation of implementations (incompatible or otherwise), there is no problem for an R standard to solve.

Programming language standards do get created for reasons other than being generally useful. The ongoing work on C++ is a good example of consultant driven standards development; consultants who make their living writing and giving seminars about the latest new feature of C++ require a steady stream of new feature to talk about and have an obvious need to keep new versions of the standard rolling down the production line. Feeling that a language is unappreciated is another reason for creating an ISO Standard; the Modula-2 folk told me that once it became an ISO Standard the use of Modula-2 would take off. R folk seem to have a reasonable grip on reality, or have I missed a lurking distorted view of reality that will eventually give people the drive to spend years working their fingers to the bone to create a standard that nobody is really that interested in?

Categories: Uncategorized Tags: C, IST/5, Modula-2, programming language, R

What I know of Dennis Ritchie’s involvement with C

October 13, 2011 Derek Jones No comments

News that Dennis Ritchie died last weekend surfaced today. This private man was involved in many ground breaking developments; I know something about one of the languages he designed, C, so I will write about that. Ritchie has written about the development of C language.

Like many language designers the book he wrote “The C Programming Language” (coauthored with Brian Kernighan in 1978) was the definitive reference for users; universally known as K&R. The rapid growth in C’s popularity led to lots of compilers being written, exposing the multiple ways it was possible to interpret some of the wording in K&R.

In 1983 ANSI set up a committee, X3J11, to create a standard for C. With one exception Ritchie was happy to keep out of the fray of standardization; on only one occasion did he feel strongly enough to step in and express an opinion and the noalias keyword disappeared from the draft (the restrict keyword surfaced in C99 as a different kind of beast).

A major contribution to the success of the C Standard was the publication of an “ANSI Standard” version of K&R (there was a red “ANSI C” stamp on the front cover and the text made use of updated constructs like function prototypes and enumeration types), its second edition in 1988. Fans of the C Standard could, and did, claim that K&R C and ANSI C were the same language (anybody using the original K&R was clearly not keeping up with the times).

Ritchie publicly admitted to making one mistake in the design of C. He thinks that the precedence of the & and | binary operators should have been greater than the == operator. I can see his point, but an experiment I ran a few years ago suggested it is amount of experience using a set of operator precedence rules that is the primary contributor to developer knowledge of the subject.

Some language designers stick with their language, enhancing it over the years (e.g., Stroustrup with C++), while others move on to other languages (e.g., Wirth with Pascal, Modula-2 and Oberon). Ritchie had plenty of other interesting projects to spend his time on and took neither approach. As far as I can tell he made little or no direct contribution to C99. As head of the research department that created Plan 9 he must have had some input to the non-Standard features of their C compiler (e.g., no support for #if and support for unnamed structures).

While the modern C world may not be affected by his passing, his ability to find simple solutions to complicated problems will be a loss to the projects he was currently working on.

Categories: Uncategorized Tags: ANSI, C, compiler, Modula-2, noalias, Oberon, Pascal, Ritchie, Standard

Proving software correct

May 2, 2011 Derek Jones 2 comments

Users want confidence that software is ‘correct’; what constitutes correct depends on who you talk to and can vary between doing what the user expects and behaving according to a specification (which may include behavior that users did not expect or want).

The gold standard for software correctness is that achieved by mathematical proofs, or at least what most people believe is achieved by such proofs, i.e., a statement that is shown through a sequence of steps to be derived from a set of axioms. The sequence of steps used in most real proofs operate at a much higher level than axioms and rely on the reader to fill in the gaps left between each step. Ever since theorems were first stated they sometimes contained faults, i.e., were not correct theorems, and as mathematicians have continued to increase the size and complexity of theorems being ‘proved’ the technical and social issues involved in believing a published proof have grown in complexity.

Software proofs usually operate by translating the source in to some mathematical formalism and using a theorem prover to show that one or more properties are met. Perhaps the most famous use of such a proof that had an outcome different than that predicted is the 1996 Ariane 5 rocket crash; various proofs had been obtained for the Ariane 4 software showing that the value of some variables would never exceed given limits, these proofs involved input values that depended on the performance of the rocket and because Ariane 5 was more powerful than Ariane 4 the proofs were no longer valid (management would have found this out had they recheck the proofs using the larger values). Update: My only knowledge of this work comes from a conversation I recall with somebody working in the formal verification area, I no longer have contact with them and the company they worked for no longer exists; Pascal Cuoq’s comment below suggests they may have overstated the formal nature of the work, I have no means of double checking.

Purveyors of ‘software proof’ systems will tell you about the importance of feeding in the correct input values and will tell you about the known proofs they have managed to verify using their system. The elephant in the room that rarely gets mentioned is the correctness of the program that translates source code into the mathematical formalism used. These translators often handle that subset of the language which is relatively easy to map to the target formalism, the MALPAS C to IL translator is one exception to this (ok, yes my company wrote this translator so the opinion might be a little biased).

The method commonly associated with claims of correctness proof for a translator or compiler is slightly different from that described above for applications. This method involves manually writing some mathematics, using the chosen formalism, that ‘implements’ the translator/compiler. Strangely there are people who think that doing this is sufficient to claim the compiler is ‘verified’ or ‘proved correct’. As any schoolboy knows it is possible to write mathematics that contains mistakes and the writing of a mathematical implementation is just the first step in a process intended to increase confidence in a claim of correctness.

One of the questions that might be asked of a ‘mathematics implementation’ of a compiler is: does it faithfully interpret source code syntax/semantics according to the syntax/semantics specified in the appropriate language document?

Answering this question requires that the language syntax/semantics be specified in some mathematical notation that is amenable to formal analysis. Various researchers have created mathematical models for languages such as Ada, CHILL and C. However, these models are not recognized as being definitive, that status belongs to the corresponding ISO Standard written in English prose. The Modula-2 standard is specified using both English prose and equivalent mathematical notation with both having equal status as the definition of the language (any inconsistency between the two is decided why analyzing what behavior was intended); there were lots of plans to do stuff with this mathematics but the ISO language committee struggled just to produce a tool capable of printing the mathematics.

The developers of the Compcert system refer to it as a formally verified C compiler front-end when the language actually verified is called Clight, which they describe as a subset of the C language. This is very interesting work and I hope they continue to refine it and add support for more C-like constructs. But let’s be clear, the one thing missing from this project is any proof of a connection to the requirements contained in the C Standard.

I don’t know what it is about formal verification but those involved can at the same time be both very particular about the language they use in their mathematics and completely over the top in the claims they make about what their tools do. A speaker from Polyspace at one MISRA C conference claimed his tool could detect 100% of the coding guidelines specified in MISRA C, a surprising achievement for a runtime tool (as it was then) enforcing requirements mainly aimed at source code; I eventually got him to agree that the tool detected 100% of the constructs specified by the small subset of guidelines they had implemented.

I doubt that the Advertising Standard Authority would allow adverts containing the claims made by some formal verification advocates to appear in print or on TV; if soap manufacturers have to follow ASA rules then so should formal verification researchers.

Without a language specification written in a form amenable to mathematical analysis any claims of correctness have to be based on the traditional means of reading English prose very carefully and writing lots of tests to probe every obscure corner of the language specification. This was the approach used for the production of the Model Implementation of C, a system designed to detect all unspecified, implementation defined and undefined uses in C programs (it used a compiler, linker and interpreter). One measure of how well an implementor has studied the standard is how many faults they have discovered in it (some people claim this is a quality of standard issue, but the similar number of defects reported against the Ada and C Standards show that at least for Ada this is not true); here are some from the Model Implementation project.

Performance on independently written tests can be a good indicator of implementation correctness, depending on the quality of the tests. Both the Perennial and PlumHall C validation suites are of high quality, while suites such as the gcc testsuite are rather ad-hoc, have poor coverage and tend to be runtime oriented. The problem with high quality validation suites is that they cost enough money to put them out of reach of many research groups (I suspect another problem is that such groups don’t understand the benefits of using such suites or think they can do just as good a job in a few weeks).

Recently a new formal verification tool for C has appeared that performs all its verification checking at program runtime, i.e., after the user source has been translated to executable form. It is still very early days for kcc (they have yet to chose a name and the command used to invoke the translator is currently being used), they have an initial system up and running and are keen to continue improving it.

I am interested in the system because of what it might evolve into, including:

a means of quickly checking the behavior of obscure bits of code (I get asked all sorts of weird questions and my brain is not always willing to switch to C language lawyer mode),
a means of checking the consistency of the requirements in the C Standard, which will require another tool making use of the formalism built up by kcc,
a tool which would help developers understand which parts of the C Standard they need to look at to understand some construct (the tool currently has a trace mode that needs lots of work).

Categories: Uncategorized Tags: correctness, faults, formal methods, mathematics, Modula-2, static analysis, theorem proving, validation suite

Build an ISO Standard and the world will beat a path to your door

December 16, 2010 Derek Jones No comments

An email I received today, announcing the release of version 1.0 of the GNU Modula-2 compiler, reminded me of some plans I had to write something about a proposal to add some new definitions to the next version of the ISO C Standard.

In the 80s I was heavily involved in the Pascal community and some of the leading members of this community thought that the successor language designed by Niklaus Wirth, Modula-2, ought to be the next big language. Unfortunately for them this view was not widely shared and after much soul searching it was decided that the lack of an ISO standard for the language was responsible for holding back widespread adoption. A Modula-2 ISO Standard was produced and, as they say, the rest is history.

The C proposal involves dividing the existing definition undefined behavior into two subcategories; bounded undefined behavior and critical undefined behavior. The intent is to provide guidance to people involved with software assurance. My long standing involvement with C means that I find the technical discussions interesting; I have to snap myself out of getting too involved in them with the observation that should the proposals be included in the revised C Standard they will probably have the same impact as the publication of the ISO Standard had on Modula-2 usage.

The only way for changes to an existing language standard to have any impact on developer usage is for them to require changes to existing compiler behavior or to specify additional runtime library functionality (e.g., Extensions to the C Library Part I: Bounds-checking interfaces).

Categories: Uncategorized Tags: C, ISO, Modula-2, Standard, undefined behavior

The Shape of Code

Archive

An ISO Standard for R (just kidding)

What I know of Dennis Ritchie’s involvement with C

Proving software correct

Build an ISO Standard and the world will beat a path to your door

Recent Posts

Recent Comments

Archives

Meta