Archive

Posts Tagged ‘formal analysis’

The POPL 2015 papers involving C

November 4, 2014 No comments

SIGPLAN (the ACM Special Interest Group on Programming LANguages) has just made available many of the papers that have been accepted for their 2015 POPL conference (Principles of Programming Languages). Good for them. I wish more conferences would do this.

There are three papers involving C, so obviously I have read those first. Two papers are heavy on the mathematics and one not quiet so heavy:

  • Sound Modular Verification of C Code Executing in an Unverified Context: Describes a tool that takes C source annotated with separation logic and translates it to C source containing runtime checks; it is intended that these runtime checks verify the conditions expressed in the separation logic. Why does the developer add the interface checks in separation logic and then translate, rather than adding them in C in the first place? This was question was not addressed
  • Common compiler optimisations are invalid in the C11 memory model and what we can do about it: This sounds like bad news, but the introduction mentions specialist optimizations that are common in that specialist area. There follows 11 pages of mathematics + another five pages in an appendix. Page 12 tells us what it is all about. Some requirements in C11 would be muck up the nice mathematics should CompCert, which currently supports C90, be upgraded to C11. In other words, a compiler implementor is complaining that wording in the standard is making their life difficult (hey, join the queue) and has published a paper about it.
  • Formal verification of a C static analyzer: An interesting piece of work spoiled by claims that a soap powder manufacturer would not be able to get away with. Verasco, the static analysis tool described, does its checking on an intermediate language that is two-steps removed from the original C source. Using the authors’ logic I could bolt on one of the existing Fortran-to-C translators and claim to have a formally-verified Fortran static analyzer, with C being just an intermediate language in the chain. The problem with analyzing an intermediate language is that the transformations that have occurred along the way have changed the semantics of the original code, so the results of any analysis could be different than if applied to the original source. An example from the paper, the code:
    z = f(x) + 2 * g(y)

    is transformed to:

    t1 = f(x); t2 = g(y); z = t1 + 2 * t2;

    The implementation thus selects one of the two possible evaluation orders for the functions f and g. It is possible that calling f before g will result in behavior that is different from calling g before f (no undefined behavior occurs because there is a sequence point before a function returns, using pre-C11 terminology).

    So Verasco is only checking one of the two possible execution paths in this code. Not a particularly sound proof.

    C-semantics is the C formal methods tool that stands head and shoulders above anything else that currently exists (a fun Fibonacci example). It is actually based on the C source and does significantly more checking than verasco, but is not mentioned in the “Related work” section of the paper.

Some of the other POPL papers look a lot more interesting and potentially useful.

What I changed my mind about in 2008

January 4, 2009 No comments

A few years ago The Edge asked people to write about what important issue(s) they had recently changed their mind about. This is an interesting question and something people ought to ask themselves every now and again. So what did I change my mind about in 2008?

1. Formal verification of nontrivial C programs is a very long way off. A whole host of interesting projects (e.g., Caduceus, Comcert and Frame-C) going on in France has finally convinced me that things are a lot closer than I once thought. This does not mean that I think developers/managers will be willing to use them, only that they exist.

2. Automatically extracting useful information from source code identifier names is still a long way off. Yes, I am a great believer in the significance of information contained in identifier names. Perhaps because I have studied the issues in detail I know too much about the problems and have been put off attacking them. A number of researchers (e.g., Emily Hill, David Shepherd, Adrian Marcus, Lin Tan and a previously blogged about project) have simply gone ahead and managed to extract (with varying amount of human intervention) surprising amounts of useful from identifier names.

3. Theoretical analysis of non-trivial floating-point oriented programs is still a long way off. Daumas and Lester used the Doobs-Kolmogorov Inequality (I had to look it up) to deduce the probability that the rounding error in some number of floating-point operations, within a program, will exceed some bound. They also integrated the ideas into NASA’s PVS system.

You can probably spot the pattern here, I thought something would not happen for years and somebody went off and did it (or at least made an impressive first step along the road). Perhaps 2008 was not a good year for really major changes of mind, or perhaps an earlier in the year change of mind has so ingrained itself in my mind that I can no longer recall thinking otherwise.