Home > Uncategorized > The Met Office ‘climategate’ Perl code

The Met Office ‘climategate’ Perl code

December 25th, 2009

In response to the Climategate goings on the UK Meteorological Office has released a subset of its land surface climate station records and some code to process it. The code consists of 397 lines of Perl (station_gridder.perl and make_global_average_ts_ascii.perl).

At various times I have been asked to suggest which part of an application’s or product’s source code should be made available to a third party. The third party may have been interested in evaluating the quality, getting a feel for the complexity or felt that they ought at least be able to say they had seen some code. In these situations there is always a trade-off between impressing the customer (e.g., well structured code containing lots of comments) and not revealing too much (e.g., impenetrable code with no comments).

Have the Met Office released the code they have used over a period of time or have they release newly written code?

The source does not have the characteristics often seen in well worn, ‘old’, code. There is no revision history (that may be due to poor programming practices or may have been stripped off prior to release; I discuss pretty printing below), the visual layout is generally consistent (this may be because the same small group of people have worked on it over time), there are no obvious hacks used to get around previous design decisions that have changed and unscientifically it just feels to me like newly written code.

Was the original code written in another language (e.g., Fortran), perhaps as part of a larger program and been rewritten in Perl?

The code does not have a Fortran ‘accent’ to it. The code was written by people who are fluent in Perl; perhaps they do not know Fortran very well and were given time to craft something presentable, hence no Fortran accent.

Why have I been referring to the code authors, plural, when writing 397 lines is well within the capabilities of a competent developer working for a day (I bet the authors spent longer in meetings about this code than actually writing it)? Developers tend to have very fixed habits when it comes to bracketing statements with curly braces, there are those who always put the open brace at the end of the line and those who always put it on a newline. The Met Office code contains both usages, sometimes within the same subroutine. Also the use of whitespace around punctuators and operators does not follow a consistent pattern, which for me rules out the use of an automated pretty printer and kind of implies more than one person doing the editing. And why are some variables names capitalized and other not (the names in subroutine read_station are all lower case while the names in the surrounding subroutines are mostly upper case)? More than one author is the simplest answer.

One Perl usage caught my eye, the construct unless is rarely used and often recommended against. Without a lot more code being available for analysis there are no obviously conclusions to draw from this usage (apart from it being an indicator of somebody who knows Perl well, most mainstream languages do not support this construct and developers have to use a ‘positive’ construct containing a negated condition rather than a ‘negative’ construct containing a positive condition).

  1. Peter
    December 25th, 2009 at 15:29 | #1

    Given that it was written in perl, it should have been rewritten as a one-liner.

  2. December 25th, 2009 at 17:29 | #2

    The people who wrote these scripts may be “fluent in Perl” (whatever that means), but they are certainly no masters of the language, nor are they plugged into the general Perl community. There is no usage of CPAN modules (many of which would have made this code more comprehensible). There are several subtle bugs in the code and the code is unnecessarily imperative. While it may not be “Fortran-accented” (and I agree), it is definitely “C-accented”. (I would say “Java-accented”, but someone will argue that there aren’t any objects.)

    If I had to guess, this code was written by 3-5 developers who are conversant in several languages and prefer Java or C++ over most other languages (likely they have a large Java or .Net application to maintain and this is a side project). Someone decided that since this was manipulating text, Perl would be the best choice and everyone else just went along. The code has undergone at least 3 internal bugfix/upgrade cycles and possibly as many as 6+.

    And, finally, this code isn’t being released to the OSS community – it’s being released as a political maneuver.

    (Oh – Damian’s anti-unless screed is almost completely disregarded by the Perl community at large, at least as judged by CPAN authors.)

  3. December 25th, 2009 at 22:24 | #3

    I think the code is comprehensible as written and somebody who knows a language other than Perl could probably figure out what is going on. CPAN does contain lots of useful modules but when writing relatively small amounts of code, unless the developer happens to be familiar with a module that solves the problem at hand it is generally less effort to to write the required code. Not using CPAN modules also makes life simpler for non-PERL users who will not know about CPAN (here is one).

    The code does have a strong imperative feel to it. I don’t see this as being necessary or unnecessary, but then I am always less than impressed by the cryptic code much beloved of Perl aficionados. I see no C or Java specific ‘accent’, but then 400 lines of code is the briefest of conversations.

    Unless you have inside information the numbers in your second paragraph might just as well been delivered by Santa Claus.

    The release of the data is obviously driven by a public relations agenda and the code might just have been tagged on as an after thought. Perhaps a couple of Met Office web site developers had some spare time on their hands and decided to put something together.

  1. No trackbacks yet.
Comments are closed.