Home > Uncategorized > Uncovering the undefined behaviors

Uncovering the undefined behaviors

I think that all programming languages contain some constructs that have undefined behavior.

The C Standard explicitly lists various constructs as having undefined behavior. It also specifies that: Undefined behavior is otherwise indicated in this International Standard by the words “undefined behavior” or by the omission of any explicit definition of behavior.; the second half of the sentence refers to what might be called implicit undefined behavior. Implicit undefined behavior can be subdivided into intentional and unintentional. Intentional undefined behavior applies to constructs that the committee considered and decided (and continues to decide) to say nothing about (e.g., question 19), while unintentional undefined behavior applies to constructs that the committee did not explicitly consider (when discovered, these often end up as defect reports, which are sometimes resolved as intentionally undefined behavior).

Fans of some languages claim that ‘their’ language does not contain any undefined behaviors.

Ada does not explicitly specify any construct as having undefined behavior, but it does specify that some constructs generate a bounded error; a rose by any other name…

I sometimes bump into language inventors claiming that ‘their’ language is fully specified, i.e., does not contain any undefined behaviors. My first question to them, about the behavior of division involving negative values, invariable requires me to explain that there are two possible ways of doing it (ignorance is bliss when fully specifying a language). The invariable answer is that the behavior is whatever the underlying implementation does (which is often written in C). In other words, they have imported all the undefined behaviors of the implementation language.

Follow-up question include: what is the order of expression evaluation (e.g., left-to-right, right-to-left, inside out…), what is the order of function argument evaluation (often driven by the direction of stack growth), what is the order of initialization and other order related questions that comes to mind. Their fully specified language quickly turns out to be a sham.

A recent post by John Regehr talks about Gödel’s incompleteness Theorem as a source of undefined behavior. My understanding is that the underlying argument is built on non-termination. How is it possible to tell the difference between non-termination and lasting longer than the age of the universe? In itself I don’t think this theorem is a source of undefined behavior; more enlightenment welcome.

  1. March 11, 2017 21:53 | #1

    Quite some time ago I studied, rather informally, program verification and some of the attempts to create tools to assist in the authoring and practical use of provably correct programs.

    One particular effort I learned about then was the Euclid programming language, and more specifically the Concurrent Euclid language and its use to write, and prove correct, a clone of the Unix kernel.

    I’m wondering if you are/were aware of Euclid and its descendants and if you know whether or not languages such as Euclid which were specifically designed in attempts to make it possible to prove programs as correct are also free from undefined behaviour (at least in goal and in theory, if not (yet) in any actual implementation).


    Greg A. Woods

  2. March 12, 2017 13:54 | #2

    @Greg A. Woods
    The 1970s and early 80s was a time when designing a language and writing its compiler and operating system in the language was the thing to do in research labs, e.g., C, Euclid, Pascal, Modula-2, Clu, Smalltalk,…

    I never got to read much about Euclid, but see from its Wikipedia entry that it has a strong functional flavour (another feature of many language designed during this period).

    Many of the restrictions placed on functional languages in this period were driven by the limitations of proof technology at that time (in particular memory capacity limitations)

    My experience is that these languages contain implicit undefined behavior, in that the language designer missed various corner cases in their specification. As a compiler writer working on many languages I knew a lot about corner cases and could usually find them quickly, given a language spec. The designers’ response was often to bluster than nobody would every write code that did that sort of thing.

    What does it mean to prove a program correct? All the groups claiming to have done this have actually shown some form of consistency between two documents (often known as a specification and an implementation). The amount of consistency depending on the technology used to do the checking. There was a wonderful case where Inmos trumpeted a proof of the correctness of their floating point implementation and when three bugs were found, claimed that the correctness proof still held because the bugs were all in the specification, i.e., the specification did not map to the behavior required by IEEE floating-point.

    The results claimed by formal methods groups ought to be reported to advertising standards’ authorities, just like soap powder manufacturers.

  1. No trackbacks yet.