Home > Uncategorized > How indeterminate is an indeterminate value?

How indeterminate is an indeterminate value?

One of the unwritten design aims of the C Standard is that it should be possible to fully implement the C Standard library in conforming C. It turned out that this was not possible in C90; the problem was implementing the memcpy function when the object being copied was an object having a struct type containing one or more padding bytes. The memcpy library function copies the bytes in one object to another object. The padding bytes might be uninitialized (they have an indeterminate value), which means accessing them is undefined behavior (in C90), i.e., use of memcpy for copying structs containing padding results in a non-conforming program.

struct {
        char c; // Occupies 1 byte
        // Possible padding bytes here
        int i;  // A 2/4-byte int sometimes has to be aligned on a 2/4-byte storage boundary
       };

Padding bytes could be set to a known value by, for instance, using memcpy to zero the storage; requiring this usage was thought to be excessive, and a hefty chunk of new words was added in C99 (some of the issues raised by this problem also cropped up elsewhere, which contributed to the will to do this).

One consequence of the new wording is that objects having type unsigned char are special in that while their uninitialized value is still indeterminate, the possible set of values excludes a trap representation, they have an unspecified value making accesses unspecified behavior (which conforming programs can contain). The uninitialized value of objects having other types can be a trap representation; it’s the possibility of a value being a trap representation that makes accessing such uninitialized objects undefined behavior.

All well and good, memcpy can now be implemented in conforming C(99) by copying unsigned chars.

Having made it possible for a conforming program to access an uninitialized object (having type unsigned char), questions about it actual value can be asked. Its value is indeterminate you say, the clue is in the term indeterminate value. Ok, what does the following value function return?

unsigned char random(void)
{
unsigned char x;
 
return x ^ x;
}

Exclusiving-oring a value with itself always produces zero. An unsigned char taking, say, values 0 to 255, pick one and you always get zero; case closed. But where does it say that an indeterminate value is always the same value? There is no wording preventing an indeterminate value being different every time it is accessed. The sound of people not breathing could be heard when this was pointed out to WG14 (the C Standard’s committee), followed by furious argument on one side or the other.

The following illustrates one situation where the value of padding bytes could change with every access. That volatile qualifier specifies that the value of c could change between two accesses (e.g., it represents the storage layout of some memory mapped I/O device). Perhaps any padding bytes following it are also effectively volatile-qualified.

struct {
        volatile char c; // A changeable 1 byte
        // Possible padding bytes may be volatile
        int i;  // No volatility here
       };

The local object x, above, is not associated with a volatile-qualified object. But, so what? Another unwritten design aim of the C Standard is to keep the wording simple, so edge cases are not called out and the behavior intended to handle padding bytes gets applied to local unsigned chars.

A compiler could decide that calls to random always return zero, based on the assumption that while indeterminate values may not be known, they are not time varying.

  1. June 18th, 2017 at 18:44 | #1

    This is a tangent from your main point, but regarding this:

    “One of the unwritten design aims of the C Standard is that it should be possible to fully implement the C Standard library in conforming C.”

    Am I correct to assume it’s not “cheating” to use a system call, e.g., to implement fopen(3) by calling open(2) or to implement malloc(3) by calling sbrk(2)? (Otherwise I don’t understand what you mean by this unwritten design aim.)

  2. June 18th, 2017 at 19:34 | #2

    I’m puzzled by this:
    “A compiler could decide that calls to random always return zero, based on the assumption that while indeterminate values may not be known, they are not time varying.”

    What makes the returned value of “random” indeterminate?
    Why not decide getchar always returns the same value then?

  3. June 18th, 2017 at 20:01 | #3

    @Aaron Brown
    Yes, some functions do have to call out to the OS.

    The many *nix oriented folk on the committee would response that this was also written in C (overlooking the fact that chunks of assembler are needed).

    The elephant in the room in discussions of this design aim are getjmp/longjmp. Nobody every pretends their implementation of these functions is anything other than pure hackery.

  4. June 18th, 2017 at 20:08 | #4

    @victor yodaiken
    The sentence you quoted is referring to the value of x being indeterminate.

    As to the status of random‘s return value…

    If the value of x is not time varying, a single value is returned (i.e., 0). Unspecified requires more than one value, so this value is neither unspecified or indeterminate.

    If the value of x is time varying, the value is indeterminate (but not a trap representation), so the program behavior remains unspecified (no big deal since any non-trivial program contains lots of unspecified behavior, most of it harmless).

  5. June 18th, 2017 at 22:15 | #5

    @Derek Jones
    My bad, I thought you were referring to the libc “random” and didn’t remember the example function you had above was called “random”. Makes a lot more sense. However, the exception for unsigned char is really a hack and indicates something is deeply wrong with the whole concept of ub. A loop copying a structure using char pointers can apparently be rewritten arbitrarily without warning by the compiler, while one using uchar pointers works!

  6. June 18th, 2017 at 23:19 | #6

    @victor yodaiken
    The issue is not how the language defines this-or-that construct, it’s all about the incentive structure of open source compilers.

  7. June 19th, 2017 at 01:30 | #7

    @Derek Jones
    thanks for the link. Interesting to see that 3 years later WG 14 seems to be utterly ignoring all of this. I think at some point the corporate funders of gcc and llvm will get annoyed.

  8. June 29th, 2017 at 20:14 | #8

    Along with what you’ve written about undefined behaviour over the years Derek, I really deeply appreciate Victor’s comment suggesting that the need for an exception in the standard for “unsigned char” is a clear indication that something is very deeply wrong with the whole concept of undefined behaviour. I think that hits the nail all the way home in one blow.

    I think that even with improved tools, and even with those tools directly integrated into compilers, to help detect abuses of undefined behaviour, there will be an ever increasing allergy to C in the user community, even (or especially) in long-time C users such as myself. Perhaps this is what both academics and WG14 really want in the first place.

    If I program in C, I need to defend against the compiler maintainers.
    If I program in Go, the language maintainers defend me from my mistakes.

    Phil Pennock
    https://bridge.grumpy-troll.org/2017/04/golang-ssh-redux/

  9. June 29th, 2017 at 20:16 | #9

    (the 3rd paragraph should be in quote marks — it is a quote from Phil, original is in the linked post)

  1. No trackbacks yet.

A question to answer *