September 25, 2011 Derek Jones No comments

Like everything else language standards are born and eventually die. IST/5, the UK programming language committee, is considering whether the British Standard for PL/1 should be withdrawn (there are two standards, ISO 6160:1979 which has been reconfirmed multiple times since 1979, most recently in 2008, and a standardized subset ISO 6522:1992, also last confirmed in 2008).

A language standard is born through the efforts of a group of enthusiastic people. A language standard dies because there is no enthusiast (a group of one is often sufficient) to sing its praises (or at least be willing to be a name on a list that is willing to say, every five years, that the existing document should be reconfirmed).

It is 20 years since IST/5 last had a member responsible for PL/1, but who is to say that nobody in the UK is interested in maintaining the PL/1 standard? Unlike many other programming language ISO Standards there was never an ISO SC22 committee responsible for PL/1. All of the work was done by members of the US committee responsible for programming language PL 22 (up until a few years ago this was ANSI committee X3). A UK person could have paid his dues and been involved in the US based work; I don’t have access to a list of committee meeting attendees and so cannot say for sure that there was no UK involvement.

A member of IST/5, David Muxworthy, has been trying to find somebody in the UK with an interest in maintaining the PL/1 standard. A post to the newsgroup comp.lang.pl1 eventually drew a response from a PL/1 developer who said he would not be affected if the British Standard was withdrawn.

GNU compiler development is often a useful source of information. In this case the PL/1 web page is dated 2007.

In 2008 John Klensin, the ISO PL/1 project editor, wrote: “No activities or requests for additions or clarifications during the last year or, indeed, the last decade. Both ISO 6160 and the underlying US national document, ANS X3.53-1976 (now ANSI/INCITS-53/1976), have been reaffirmed multiple times. The US Standard has been stabilized and the corresponding technical activity was eliminated earlier this year”.

It looks like the British Standard for PL/1 is not going to live past the date of its next formal review in 2013. Thirty four years would then be the time span, from publication of last standard containing new material to formal withdrawal of all standards, to outlive. I wonder if any current member of either of the C or C++ committees will live to see this happen to their work?

Categories: Uncategorized Tags: INCITS, ISO Standard, IST/5, PL/1, SC22

Automatically improving code

September 19, 2011 Derek Jones 3 comments

Compared to 20 or 30 years ago we know a lot more about the properties of algorithms and better ways of doing things often exist (e.g., more accurate, faster, more reliable, etc). The problem with this knowledge is that it takes the form of lots and lots of small specific details, not the kind of thing that developers are likely to be interested in, or good at, remembering. Rather than involve developers in the decision-making process, perhaps the compiler could figure out when to substitute something better for what had actually been written.

While developers are likely to be very happy to see what they have written behaving as accurately and reliably as they had expected (ignorance is bliss), there is always the possibility that the ‘less better’ behavior of what they had actually written had really been intended. The following examples illustrate two relatively low level ‘improvement’ transformations:

this case is probably a long-standing fault in many binary search and merge sort functions; the relevant block of developer written code goes something like the following:

while (low <= high)
   {
   int mid = (low + high) / 2;
   int midVal = data[mid];
 
   if (midVal < key)
      low = mid + 1
   else if (midVal > key)
      high = mid - 1;
   else
      return mid;
   }

The fault is in the expression (low + high) / 2 which overflows to a negative value, and returns a negative value, if the number of items being sorted is large enough. Alternatives that don’t overflow, and that a compiler might transform the code to, include: low + ((high - low) / 2) and (low + high) >>> 1.

the second involves summing a sequence of floating-point numbers. The typical implementation is a simple loop such as the following:

sum=0.0;
for i=1 to array_len
   sum += array_of_double[i];

which for large arrays can result in sum losing a great deal of accuracy. The Kahan summation algorithm tries to take account of accuracy lost in one iteration of the loop by compensating on the next iteration. If floating-point numbers were represented to infinite precision the following loop could be simplified to the one above:

sum=0.0;
c=0.0;
 for i = 1 to array_len
   {
   y = array_of_double[i] - c; // try to adjust for previous lost accuracy
   t = sum + y;
   c = (t - sum) - y; //  try and gets some information on lost accuracy
   sum = t;
   }

In this case the additional accuracy is bought at the price of a decrease in performance.

Compiler maintainers are just like other workers in that they want to carry on working at what they are doing. This means they need to keep finding ways of improving their product, or at least improving it from the point of view of those willing to pay for their services.

Many low level transformations such as the above two examples would be not be that hard to implement, and some developers would regard them as useful. In some cases the behavior of the code as written would be required, and its transformed behavior would be surprising to the author, while in other cases the transformed behavior is what the developer would prefer if they were aware of it. Doesn’t it make sense to perform the transformations in those cases where the as-written behavior is least likely to be wanted?

Compilers already do things that are surprising to developers (often because the developer does not fully understand the language, many of which continue to grow in complexity). Creating the potential for more surprises is not that big a deal in the overall scheme of things.

Categories: Uncategorized Tags: accuracy, algorithms, compiler, faster, faults, floating-point, kahan summation, surprises, the future, transformation

C compiler validation is 21 today!

September 1, 2011 Derek Jones 2 comments

Today, 1 September 2011, is the 21th anniversary of the first formally validated C compilers. The three ‘equal first’ validated compilers were the Model Implementation C Checker from Knowledge Software, Topspeed C from JPI (run by the people who created Turbo Pascal) and the INMOS C compiler (derived from the Norcroft C compiler written by Alan Mycroft+others, the author of the longest response document seen during the review of the C89 draft standard).

Back in the day the British Standards Institution testing group run by John Souter were the world leaders in compiler validation and were very proactive in adding support for a new language. NIST, the equivalent US body, did not offer such a service until a few years later. Those companies in a position to have their compilers validated (i.e., the compiler passed the validation suite) were pressing BSI to be first; the ‘who is first’ issue was resolved by giving all certificates the same date (the actual validation process of a person from BSI, Neil Martin now Director of Test in the Winterop Team at Microsoft, turning up to ‘witness’ the compiler passing the tests happened several weeks earlier).

Testing C compilers was different from other language compilers in that sufficient demand existed to support commercial production and maintenance of test suites (the production of validation suites for previous language compilers had been government funded). After a review of the available test suites BSI chose to use the Plum Hall suite; after a similar review NIST chose to use the Perennial suite (I got involved in trying to figure out for NIST how well this suite covered the requirements contained in the C Standard).

For a while C compiler validation was big business (as in big fish, very small pond). But the compiler validation market is dependent on there being lots of compilers, which requires market fragmentation and to a lesser extent lots of different OSs and hardware platforms (each needing a separate validation). The 1990s saw market consolidation, gcc becoming good enough for commercial use and a shift of developer mind share to C++. Dwindling revenue resulted in BSI’s compiler validation group being shut down after a few years and NIST’s followed in 1998.

Is compiler validation relevant today? When the first C Standard was published a lot of compilers in common use had some significant behavioural differences compared to what the Standard specified. Over time these compilers have either disappeared or been upgraded (a potential customer once asked me the benefits I saw in them licensing the Knowledge Software front end and the reply to one of my responses, “you can tell your customers that the compiler is standard’s compliant”, was that this was not a benefit as they had been claiming this for years). Improvements in Intel’s x86 processor also had a hand in improving compiler Standard’s conformance; the various memory models used by the x86 processor was a huge headache for compiler writers whose products often behaved very differently under different memory models; the arrival of the Pentium with its flat 32-bit address space meant this issue disappeared over time.

These days I suspect that the major compilers targeting platforms where portability is expected (portability is often not a big expectation in the embedded world) are sufficiently compatible that developers are willing to overlook small differences with the Standard. Differences in third party libraries, GUIs and other frameworks have been the big headache for many years now.

Would the ‘platform portability’ compilers, that’s probably gcc, Microsoft, products using EDG’s front end, and perhaps llvm in the coming years, pass the latest version of the PlumHall and Perennial suites?

The gcc team do not have access to either company’s suite. The gcc regression tests are a poor substitute for a proper compiler validation suite (even though they cost many thousands of dollars commercial compiler writers often buy both companies products because they are good value for money as a testing resource {the Fortran 78 validation suite source gives some idea of how much work is actually involved). I would expect gcc to fail some of the tests but have no idea how many or serious the failures would be.
Microsoft have said they don’t have plans to support C99 (it took a lot of prodding to get them interested in formally validating against C90).
I think the llvm team are in the same position as gcc, but perhaps somebody at Apple has access to one or more of the commercial suites (I don’t know).
EDG are into standard’s conformance and I would expect them to pass both suites.

The certificate is printed on high quality, slightly yellow paper; the template wording is in a subdued gray ink while the customer information is in a very bold black ink. I don’t know whether this is to make life difficult for counterfeiters, but I could not get any half decent photographs and the color scanner had to be switched to black&white.

Validation was good for one year and I saw no worthwhile benefit in paying BSI £5,000 to renew for another year. Few people knew about the one year rule and I did not enlighten them. In the Ada compiler market the one year rule was a major problem, but lets leave that for another time.

Model Implementation C validation certificate.

Categories: Uncategorized Tags: BSI, compiler testing, EDG, Intel x86, ISO Standard, memory model, model implementation, NIST, Norcroft C, Topspeed C, validation suite

The Shape of Code

Archive

Does the UK need the PL/1 Standard?

Automatically improving code

C compiler validation is 21 today!

Recent Posts

Recent Comments

Archives

Meta