Home > Uncategorized > UNCOL and UML

UNCOL and UML

U is for UNCOL and UML.

UNCOL is the compiler intermediate language that gets named when people need a historical starting point to refer to. This kind of intermediate language is one that contains all the information needed for a compiler front-end to communicate a representation of the original source code to an independent compiler back-end that generates machine code (the intermediate languages used for internal communication within a compiler are often just tree data structures that also make use of information held in other data structures, i.e., they are not points where a clean cut can be made to insert independent, self contained, modules). The rationale for a common intermediate language is saving time and money by reducing the number of compilers that need to be created from N_L * N_M to N_L + N_M, where N_L is the number of languages and N_M the number of target machines.

Generating high quality code requires having access to as much information as possible about the original source, which means the intermediate language split-point needs to occur as soon after source language specific processing as possible (otherwise all kinds of subtle target machine architecture assumptions get embedded in the structure of the intermediate code, which can make life difficult when the target makes different assumptions). ANDF is one of the few implementations that makes the split-point early (potential customers were not happy that it would be relatively easy to reverse engineer the original source).

Traditionally many more back-end got written than front-ends and the decision on where to make the intermediate language split-point was driven by the desire to reduce the time and effort needed to write a back-end, i.e., it was as close to actually pumping out machine code as possible. The quality of the generated code is not as good as it might be, but a new compiler can be shipped much quicker and more cheaply. Almost every retargetable compiler has made this design choice.

These days with more new languages appearing than new processors there would appear to be advantages in shifting the intermediate language split-point closer to the front-end. In practice compiler vendors get most of their income from back-end work (e.g., vendors wanting support for their new processor) and income from new front-ends is very rare, i.e., there is no incentive to change the status-quo.

UML is not regarded as a serious language by many developers. Is this because programs look like pretty pictures, something that managers might use, or UML being used for very non-developer problems like business processes? I don’t know. UML seems to be growing in popularity and have an increasing number of support tools. Perhaps one day I will use it for real and be able to say something of interest.

Things to read

A Code Generation Interface for ANSI C by C. W. Fraser and D. R. Hanson.

Code selection through object code optimization by J. W. Davidson and C. W. Fraser.

Categories: Uncategorized Tags:
  1. No comments yet.
  1. No trackbacks yet.