### Archive

Archive for June, 2009

## Searching for the source line implementing 3n+1

I have been doing some research on the variety of ways that different developers write code to implement the same specification and have been lucky enough to obtain the source code of approximately 6,000 implementations of a problem based on the 3n+1 algorithm. At some point this algorithm requires multiplying a value by three and adding one, e.g., `n=3*n+1;`.

While I expected some variation in the coding of many parts of the algorithm I did not expect to see much variation in the 3n+1 part, perhaps somebody might write `n=n*3+1;`. I was in for a surprise, the following are some of the different implementations I have seen so far:

n = n + n + n + 1 ;
n += n + n + 1;
n = (n << 1) + n + 1; n += (n << 1) + 1; n *= 3; n++; t = (n << 1) ; n = t + n + 1; n = (n << 2) - n + 1; I was already manually annotating the source and it was easy for me to locate the line implementing 3n+1 to annotate it. But what if I wanted to automate the search for the line of code containing this calculation, what tool could I use? Would I have to write down every possible ways in which 3n+1 could be implemented, with/without parenthesis and all possible orderings of operands? I am not aware of any automatic tool that could be told to locate expressions that calculated 3n+1. What is needed is abstract interpretation over short sequences of statements.

I mentioned this search problem over drinks after a talk I gave at the Oxford branch of the ACCU last week and somebody (Huw ???) suggested that perhaps the code generated by gcc would be the same no matter how 3n+1 was implemented. I could see lots of reasons why this would not be the case, but the idea was interesting and worth investigation.

At the default optimization level the generated x86 code is different for different implemenetations, but optimizing at the “-O 3″ level results in all but one of the above expressions generating the same evaluation code:

` leal 1(%rax,%rax,2), %eax`

The exception is `(n << 2) - n + 1` which results in shift/subtract/add. Perhaps I should report this as a bug in gcc :-)

I was surprised that gcc exhibited this characteristic and I plan to carry out more tests to trace out the envelope of this apparent "same generated code for equivalent expressions" behavior of gcc.

Tags:

## Will language choice converge to a few?

Will the number of commonly used programming languages converge to a few that remain commonly used for ever, will there be many relatively common languages in use, or will the (relatively) commonly used languages change over time?

There are plenty of advantages to having one programming language that everybody uses for ever. English+local dialects seems to be heading towards becoming the World’s one native language, but the programming language world seems to be moving in the direction of diversification and perhaps even experiencing changing popularity of those in common use.

What are the forces that drive programing language usage?

Existing code. If a company wants to maintain and update its software products it needs to hire people to use the language they are written in. This is a force that maintains the status quo.

Existing programmer skills. When given the task of writing new software where language usage is not specified, developers are likely to pick a language they already know. In the case of group development the choice is made by group leaders. This is a force that maintains the status quo.

Fashion. Every field has fashions and programming language usage is no exception. Using a particular language can be seen as sexy, leading-edge, innovative, the next big-thing, etc. Given the opportunity some developers will chose to learn and write code in this language.

Desire to learn a new language. Some developers like to learn new things and this includes programming languages. Given the opportunity such developers will sometimes chose to learn and write code in a language they find interesting.

The cost of creating and implementing a new language continues to be within the reach of one individual who is willing to invest the considerable effort required. Hundreds, if not thousands of new languages have been created every year almost since computers were first invented. The only change here over the last 40 years is probably an increase in the number of new languages.

What has changed in the last 15 years is ease of transmission (e.g., a ubiquitous computing platform and the Internet) and the growth of the fashion industry (e.g., book publishers).

Computing is bathed in newness. New products, new chips, new gadgets, new software, new features, new and improved, the latest. What self respecting developer would want to be caught dead using a language invented before they were born?

Publishers need a continuous stream of new subjects that will drive customers to buy books. What better subject than a hot new programming language?

At the moment we seem to be living through a period of programming language usage divergence. Will this evolutionary trend continue or are we currently in the Cambrian explosion period of software engineering evolution?

What are the forces acting against the use of new languages? At the moment the only significant forces acting against the use of new languages are existing source and existing developer expertise. There are weaker forces, for instance, the worry that in the years to come it will be difficult to find developers to maintain existing software written in what has become an obscure language, but most software has a short lifetime and in many application domains this is not an issue. Whether the fashion for newness will eventually diminish enough to significantly slow the take-up of new languages remains to be seen.