Archive

Posts Tagged ‘Standard’

Cobol 2014, perhaps the definitive final version of the language…

November 6th, 2014 No comments

Look what arrived in the post this morning, a complementary copy of the new Cobol Standard (the CD on top of a paper copy of the 1985 standard).

Photo of Cobol 2014 and 1985 Standards

In the good old days, before the Internet, members of IST/5 received a complementary copy of every new language standard in comforting dead tree form (a standard does not feel like a standard until it is weighed in the hand; pdfs are so lightweight); these days we get complementary access to pdfs. I suspect that this is not a change of policy at British Standards, but more likely an excessive print run that they need to dispose of to free up some shelf space. But it was nice of them to think of us workers rather than binning the CDs (my only contribution to Cobol 2014 was to agree with whatever the convener of the committee proposed with regard to Cobol).

So what does the new 955 page standard have to say for itself?

“COBOL began as a business programming language, but its present use has spread well beyond that to a general purpose programming language. Significant enhancements in this International Standard include:

— Dynamic-capacity tables

— Dynamic-length elementary items

— Enhanced locale support in functions

— Function pointers

— Increased size limit on alphanumeric, boolean, and national literals

— Parametric polymorphism (also known as method overloading)

— Structured constants

— Support for industry-standard arithmetic rules

— Support for industry-standard date and time formats

— Support for industry-standard floating-point formats

— Support for multiple rounding options”

I guess those working with Cobol will find these useful, but I don’t see them being enough to attract new users from other languages.

I have heard tentative suggestions of the next revision appearing in the 2020′s, but with membership of the Cobol committee dying out (literally in some cases and through retirement in others) perhaps this 2014 publication is the definitive final version of Cobol.

Tags: , ,

Reality in the world of programming language standards

July 16th, 2014 No comments

I see a lot of steam being vented about the standards’ process as applied to programming languages and software related topics. Knowing something about how the process works might help people live calmer lives, at least once they have calmed down after reading this article. What I have to say applies to programming language standards because these are what I have been involved with, as a member and convener of various UK and international committtees, for 25+ years.

  • ISO and your national standards’ body don’t care about the standard you are talking about.

    These organizations are monopolies who are required to demonstrate that documented procedures are followed by all concerned. Can you think of any organizational structure that would create less incentives for those on the inside to listen to those on the outside?

    Yes, these organizations do sell standards but the sales model is all about the long tail and no peak, to speak of, of best sellers. The real business model for running a standards’ organization is to either charge members a fee (your country pays membership dues for each Standards Committee it wants a say in; if your country has not paid to be a member of ISO JTC 1/SC22 you have no say in programming language standards. ANSI in the US charges people for the right to volunteer their time to attend meetings to work on a standard) and/or rely on government subsidy.

    Not being cared about is actually a luxury that people who work in programming language standards should aspire to. The bureaucrats who work in standards hate us; here in the UK there has been at least one attempt to kill off work on programming language standards and I have heard of similar experiences in other countries. The problem is that the standards we produce don’t fit the mold that works for most other standards; programming language standards contain an order of magnitude more pages than the average standard (until recently there was a print run of new standards which then had to be stored until sold and the volume occupied by programming language standards was of note {so I’m reliably informed}), take longer to produce (i.e., more work for the bureaucrats) and all this cost is not justified by the sales figures (which are confidential and last time I saw them only just required me to take my shoes and socks off to count).

  • Standards are created by the volunteers who regularly turn up at meetings.

    It is only the enthusiasm of these volunteers that makes the process work. If you don’t turn up at meetings then what you think does not count (not quite true, something you write might influence the thinking of one of the worker bees who attends meetings resulting in wording in the standard).

    If you really are interested in a standard then become an active member of the committee responsible for it, at least the national one and if you have the time the international one

  • Committee documents can be made public.

    There are no rules preventing a standards committee putting its documents on a website for Joe public to download. The issue is finding somebody willing to do the work of hosting the website (the programming language world is lucky to have Keld Simonsen) and a willingness of committee members to be open about all their documents.

    Looking in from the outside it seems to me that many non-programming language committees want to maintain an aura of mystic and privileged access.

Single-quote as a digit separator soon to be in C++

September 30th, 2013 4 comments

At the C++ Standard’s meeting in Chicago last week agreement was finally reached on what somebody in the language standards world referred to as one of the longest bike-shed controversies; the C++14 draft that goes out for voting real-soon-now will include support for single-quotation-mark as a digit separator. Assuming the draft makes it through ISO voting you could soon be writing (Compiler support assumed) 32'767 and 0.000'001 and even 1'2'3'4'5'6'7'8'9 if you so fancied, in your conforming C++ programs.

Why use single-quote? Wouldn’t underscore have been better? This issue has been on the go since 2007 and if you feel really strongly about it the next bike-shed C++ Standard’s meeting is in Issaquah, WA at the start of next year.

Changing the lexical grammar of a language is fraught with danger; will there be a change in the behavior of existing code? If the answer is Yes, then the next question is how many people will be affected and how badly? Let’s investigate; here are the lexical details of the proposed change:

pp-number:
    digit
    . digit
    pp-number digit
    pp-number ' digit
    pp-number ' nondigit
    pp-number identifier-nondigit
    pp-number e sign
    pp-number E sign
    pp-number .

Ideally the change of behavior should cause the compiler to generate a diagnostic, when code containing it is encountered, so the developer gets to see the problem and do something about it. The following conforming C++ code will upset a C++14 compiler (when I write C++ I mean the C++ Standard as it exists in 2013, i.e., what was called C++11 before it was ratified):

#define M(x) #x   // stringize the macro argument
 
char *p=M(1'2,3'4);

At the moment the call to the macro M contains one argument, the sequence of three tokens {1}, {'2,3'} and {4} (the usual convention is to bracket the characters making up one token with matching curly braces).

In C++14 the call to M will contain the two arguments {1'2} and {3,4}. conforming compiler is required to complain when the number of arguments to a macro invocation don’t match the definition…. Unless the macro is defined to accept a variable number of arguments:

#define M(x, ...) __VA_ARGS__
 
          int x[2] = { M(1'2,3'4) };
// C++11: int x[2] = {};
// C++14: int x[2] = { 3'4 };

This is the worst kind of change in behavior, known as a silent change, the existing code compiles without complaint but has different behavior.

How much existing code contains either of these constructs? I suspect very very little human written code, maybe even none. This is the sort of stuff that is more likely to be produced by automatic code generators. But how much more likely? I have no idea.

How much benefit does the new feature provide? It certainly looks useful, but coming up with a number for the benefit is hard. I guess it has the potential to shave a fraction of a second off of the attention a developer has to pay when reading code, after they have invested in learning about the construct (which is lots of seconds). Multiplied over many developers and not that many instances (the majority of numeric literals contain a single digit), we could be talking a man year or two per year of worldwide development effort?

All of the examples I have seen require the ‘assistance’ of macros, here is another (courtesy of Jeff Snyer):

#define M(x) A ## x
#define A0xb
 
int operator "" _de(char);
int x = M(0xb'c'_de);

Are there any examples of a silent change that don’t involve the preprocessor?

Will programming languages now have to follow ISO fast-track rules?

February 4th, 2013 No comments

A while back I wrote about how updated versions of ECMAscript (i.e., the Standard for Javascript) had twice been fast tracked to replace an existing ISO Standard, however the ISO rules require that once a document becomes one of its standards all future work be done using the ISO process (i.e., you only supposed to get the one original fast track and then you have to get at least half a dozen countries to say they will actively participate in ongoing work). Thirteen years after I asked why it was being allowed to happen (as I recall I only raised the issue because I thought I had misunderstood the rules, not because I had a burning desire to enforce them) the issue has suddenly sprung to life (we are talking Standard’s world ‘sudden’ here), with a question being raised at the last SC22 meeting and a more detailed one being prepared by BSI for the next meeting (they occur once per year).

The Elephant in the room here is ISO/IEC 29500:2008, not a programming language but Microsoft’s Office Open XML; there was quite a bit of fuss when this was fast tracked.

If the ISO rules on one-time only use of the fast track process was limited to programming languages I imagine the bureaucrats in Geneva would probably never get to hear about it (SC22 would probably conclude that there was not enough interest in the various documents outside of the submitting country to form an active ISO working group; so leave well alone).

ISO sells over 19,000 standards and has better things to do than spend time on the goings on in an unfashionable part of the galaxy, unless, that is, it has the potential to generate lots of fuss that undermines credibility.

Will Microsoft try to fast track an updated version of ISO 29500? I don’t even know if they are updating it. The possibility that ISO 29500 might be updated and submitted for fast track will make it hard for SC22 to agree to any future fast-track updates to existing ISO Standards it is responsible for.

The following is a list of documents that have been fast tracked to become an ISO Standard:

ECMAScript:
ECMA-262 (1st edn) = ISO 16262:1998
ECMA-262 (3rd edn) = ISO 16262:1999
ECMA-262 (5th edn) = ISO 16262:2011

C#:
ECMA-334 (2nd edn) = ISO 23270:2003
ECMA-334 (4th edn) = ISO 23270:2006

CLI:
ECMA-335 (2nd edn) = ISO 23271:2003
ECMA-335 (6th edn) = ISO 23271:2012

ECMA standards fast-tracked to ISO and not yet revised:
ECMA-149 PCTE part 1 = ISO 13719-1:1998
ECMA-158 PCTE part 2 = ISO 13719-2:1998
ECMA-162 PCTE part 3 = ISO 13719-3:1998
ECMA-230 PCTE IDL binding = ISO 13719-4:1998
ECMA-367 EIFFEL = ISO 25436:2006
ECMA-372 C++/CLI -> DIS 26926; failed DIS ballot and project cancelled

Replaced rather than revised under JTC1 rules:
CHILL (from CCITT): ISO standards 9496:1989, 9496:1995, 9496:1998, 9496:2003
MUMPS/M (from Mass Gen Hospital/ANSI): ISO standards 11756:1992, 11756:1999

Non-ECMA documents fast-tracked through ISO and not yet revised:
FORTH (from FORTH Inc): ISO 15145:1997
JEFF (from J consortium): ISO 20970:2002
Ruby (from Japanese Industrial Standards Committee): ISO/IEC 30170:2012

Why does Coverity restrict who can see its tool output?

July 20th, 2012 2 comments

Coverity generate a lot of publicity from their contract (started under a contract with the US Department of Homeland Security, don’t know if things have changed) to scan large quantities of open source software with their static analysis tools and a while back I decided to have a look at the warning messages they produce. I was surprised to find that access to the output required singing a nondisclosure agreement, this has subsequently been changed to agreeing to a click-through license for the basic features and signing a NDA for access to advanced features. Were Coverity limiting access because they did not want competitors learning about new suspicious constructs to flag or because they did not want potential customers seeing what a poor job their tool did?

The claim that access “… for most projects is permitted only to members of the scanned project, partially in order to ensure that potential security issues may be resolved before the general public sees them.” does not really hold much water. Anybody interested in finding security problems in code could probably find hacked versions of various commercial static analysis tools to download. The SAMATE – Software Assurance Metrics And Tool Evaluation project runs yearly evaluations of static analysis tools and makes the results publicly available.

A recent blog post by Andy Chou, Coverity’s CTO, added weight to the argument that their Scan tool is rather run of the mill. The post discussed a new check that had recently been added flagging occurrences of memcmp, and related standard library functions, that were being tested for equality (or inequality) with specific integer constants (these functions are defined to return a negative or positive value or 0, and while many implementations always return the negative value -1 and the positive value 1 a developer should always test for the property of being negative/positive and not specific values that have that property). Standards define library functions to have a wide variety of different properties and tools that check for correct application of these properties have been available for over 15 years.

My experience of developer response, when told that some library function is required to return a negative value and some implementation might not return -1, is that their regard any implementation not returning -1 as being ‘faulty’ since all implementations in their experience return -1. I imagine that library implementors are aware of this expectation and try to meet it. However, optimizing compilers often try to automatically inline calls to memcpy and related copy/compare functions and will want to take advantage of the freedom to return any negative/positive value if it means not having to perform a branch test (a big performance killer on most modern processors).

A free pdf that is not the C++ Standard

April 14th, 2012 No comments

One of the most annoying things about working on programming language standards is to see the exorbitant price ISO charge for the final published document created by experts toiling away for free over many years. This practice is unlikely to change.

Once a document has been through a successfully ballot (i.e., accepted for publication as a Standard) ISO has very strict rules about what changes can be made to it prior to actual publication (only typos of the most very innocuous kind can be fixed). Of course programming language committees live on after the publication of a Standard by ISO and it is very useful for them if the document they start deliberating on is one that has had all the typos corrected, not just the really innocuous ones.

The list of typos in the 2011 C++ Standard have been fixed in the freely down loadable committee document N3337 that is not the official C++ Standard, however uncannily similar it may look (no ISO gobbledygook at the front for instance). The equivalent committee document for C is not yet available on the WG14 site.

If you really do need a copy of the C++ Standard the 2011 (i.e., latest) document in pdf form is available for $30 from ANSI; the 2011 C Standard is also available for the same price. Don’t worry about the ANSI version being dated 2012 rather than 2011; National Standards bodies must sell the ISO version at ISO prices but are allowed to publish localized versions for which they can set their own price (so you pay less and get various US specific acronyms and verbiage printed on the front cover).

What I know of Dennis Ritchie’s involvement with C

October 13th, 2011 No comments

News that Dennis Ritchie died last weekend surfaced today. This private man was involved in many ground breaking developments; I know something about one of the languages he designed, C, so I will write about that. Ritchie has written about the development of C language.

Like many language designers the book he wrote “The C Programming Language” (coauthored with Brian Kernighan in 1978) was the definitive reference for users; universally known as K&R. The rapid growth in C’s popularity led to lots of compilers being written, exposing the multiple ways it was possible to interpret some of the wording in K&R.

In 1983 ANSI set up a committee, X3J11, to create a standard for C. With one exception Ritchie was happy to keep out of the fray of standardization; on only one occasion did he feel strongly enough to step in and express an opinion and the noalias keyword disappeared from the draft (the restrict keyword surfaced in C99 as a different kind of beast).

A major contribution to the success of the C Standard was the publication of an “ANSI Standard” version of K&R (there was a red “ANSI C” stamp on the front cover and the text made use of updated constructs like function prototypes and enumeration types), its second edition in 1988. Fans of the C Standard could, and did, claim that K&R C and ANSI C were the same language (anybody using the original K&R was clearly not keeping up with the times).

Ritchie publicly admitted to making one mistake in the design of C. He thinks that the precedence of the & and | binary operators should have been greater than the == operator. I can see his point, but an experiment I ran a few years ago suggested it is amount of experience using a set of operator precedence rules that is the primary contributor to developer knowledge of the subject.

Some language designers stick with their language, enhancing it over the years (e.g., Stroustrup with C++), while others move on to other languages (e.g., Wirth with Pascal, Modula-2 and Oberon). Ritchie had plenty of other interesting projects to spend his time on and took neither approach. As far as I can tell he made little or no direct contribution to C99. As head of the research department that created Plan 9 he must have had some input to the non-Standard features of their C compiler (e.g., no support for #if and support for unnamed structures).

While the modern C world may not be affected by his passing, his ability to find simple solutions to complicated problems will be a loss to the projects he was currently working on.

Build an ISO Standard and the world will beat a path to your door

December 16th, 2010 No comments

An email I received today, announcing the release of version 1.0 of the GNU Modula-2 compiler, reminded me of some plans I had to write something about a proposal to add some new definitions to the next version of the ISO C Standard.

In the 80s I was heavily involved in the Pascal community and some of the leading members of this community thought that the successor language designed by Niklaus Wirth, Modula-2, ought to be the next big language. Unfortunately for them this view was not widely shared and after much soul searching it was decided that the lack of an ISO standard for the language was responsible for holding back widespread adoption. A Modula-2 ISO Standard was produced and, as they say, the rest is history.

The C proposal involves dividing the existing definition undefined behavior into two subcategories; bounded undefined behavior and critical undefined behavior. The intent is to provide guidance to people involved with software assurance. My long standing involvement with C means that I find the technical discussions interesting; I have to snap myself out of getting too involved in them with the observation that should the proposals be included in the revised C Standard they will probably have the same impact as the publication of the ISO Standard had on Modula-2 usage.

The only way for changes to an existing language standard to have any impact on developer usage is for them to require changes to existing compiler behavior or to specify additional runtime library functionality (e.g., Extensions to the C Library Part I: Bounds-checking interfaces).

A fault in the C Standard or existing compilers?

February 24th, 2009 No comments

Software is not the only entity that can contain faults. The requirements listed in a specification are usually considered to be correct, almost by definition. Of course the users of software implementing a specification may be unhappy with the behavior specified and wish that some alternative behavior occurred. A cut and dried fault occurs when two requirements conflict with each other.

The C Standard can be read as a specification for how C compilers should behave. Despite over 80 man years of effort and the continued scrutiny of developers over 20 years, faults continue to be uncovered. The latest potential fault (it is possible that the fault actually occurs in many existing compilers rather than the C Standard) was brought to my attention by Al Viro, one of the Sparse developers.

The issue involved the following code (which I believe the standard considers to be strictly conforming, but all the compilers I have tried disagree):

int (*f(int x))[sizeof x];  // A prototype declaration
 
int (*g(int y))[sizeof y]  // A function definition
{
return 0;
}

These function declarations are unusual in that their return type is a pointer to an array of integers, a type rarely encountered in this context (the original question involved a return type of pointer to function returning … and was more complicated).

The specific issue was the scope of the parameter (i.e., x and y), is the declaration still in scope at the point that the second occurrence of the identifier is encountered?

As a principle I think that the behavior, whatever it turns out to be, should be the same in both cases (neither the C standard or its rationale state such a principle).

Taking the function prototype case first:

The scope of the parameter x “… terminates at the end of the function declarator.” (sentence 409).

and does function prototype scope include the return type (the syntax calls the particular construct a declarator and there are at least two of them, one nested inside the other, in a function prototype declaration)?

Sentence 1592 says Yes, but sentence 279 and 1845 say No.

None of these references are normative references (standardize for definitive).

Moving on to the function definition case:

Where does the scope of the parameter x begin (sentence 418)?
… scope that begins just after the completion of its declarator.

and where does the scope end (sentence 408)?
… which terminates at the end of the associated block.

and what happens between the beginning and ending of the scope (sentence 412)?
Within the inner scope, the identifier designates the entity declared in the inner scope;

This looks very straight forward, there are no ‘gaps’ in the scope of the parameter definition appearing in a function definition. Consistency with the corresponding function prototype case requires that function declarator be interpreted to include the return type.

There is a related discussion involving a previous Defect Report 345 submitted a while ago.

The problem is that many existing compilers do not treat parameter scope in this way. They operate as-if there was a ‘gap’ in the parameter scope of function definition (probably because the code implementing this functionality is shared with that implementing function prototypes, which have been interpreted to not include the return type).

What happens next? Probably lots of discussion on the C Standard email reflector. Possible outcomes include somebody finding wording that requires a ‘gap’ in the scope of parameters in function definitions, it agreed that such a gap ought to be specified by the standard (because this is how existing code behaves because this is how compilers operate), or that the standard is correct as is and any compiler that behaves differently needs to be fixed.

C++ goes for too big to fail

December 8th, 2008 No comments

If you believe the Whorfian hypothesis that language effects thought, even in one of its weaker forms, then major changes to a programming language will effect the shape of the code its users write.

I was at the first International C++ Standard meeting in London during 1991 and coming from a C Standard background I could not believe the number of new constructs being invented (the C committee had a stated principle that a construct be supported by at least one implementation before it be considered for inclusion in the standard; ok, this was not always followed to the letter). The C++ committee members continued to design away, putting in a huge amount of effort, and the document was ratified before the end of the century.

The standard is currently undergoing a major revision and the amount of language design going on puts the original committee to shame. With over 1,300 pages in the latest draft nobodies favorite construct is omitted. The UK C++ panel has over 10 people actively working on producing comments and may produce over 1,000 on the latest draft.

With so many people committed to the approach being taken in the development of the revised C++ Standard its current direction is very unlikely to change. The fact that most ‘real world’ developers only understand a fraction of what is contained in the existing standard has not stopped it being very widely used and generally considered as a ‘success’. What is the big deal over a doubling of the number of pages in a language definition, the majority of developers will continue to use the small subset that they each individually have used for years.

The large number of syntactic ambiguities make it is very difficult to parse C++ (semantic information is required to resolve the ambiguities and the code to do this is an at least an order of magnitude bigger than the lexer+parser). This difficulty is why there are so few source code analysis tools available for C++, compared to C and Java which are much much easier to parse. The difficulty of producing tools means that researchers rarely analyse C++ code and only reasonably well funded efforts are capably of producing worthwhile static analysis tools.

Like many of the active committee members I have mixed feelings about this feature bloat. Yes it is bad, but it will keep us all actively employed on interesting projects for many years to come. As the current financial crisis has shown, one of the advantages of being big and not understood is that you might get to being too big to fail.

Tags: , , ,