September 19, 2011 Derek Jones 3 comments

Compared to 20 or 30 years ago we know a lot more about the properties of algorithms and better ways of doing things often exist (e.g., more accurate, faster, more reliable, etc). The problem with this knowledge is that it takes the form of lots and lots of small specific details, not the kind of thing that developers are likely to be interested in, or good at, remembering. Rather than involve developers in the decision-making process, perhaps the compiler could figure out when to substitute something better for what had actually been written.

While developers are likely to be very happy to see what they have written behaving as accurately and reliably as they had expected (ignorance is bliss), there is always the possibility that the ‘less better’ behavior of what they had actually written had really been intended. The following examples illustrate two relatively low level ‘improvement’ transformations:

this case is probably a long-standing fault in many binary search and merge sort functions; the relevant block of developer written code goes something like the following:

while (low <= high)
   {
   int mid = (low + high) / 2;
   int midVal = data[mid];
 
   if (midVal < key)
      low = mid + 1
   else if (midVal > key)
      high = mid - 1;
   else
      return mid;
   }

The fault is in the expression (low + high) / 2 which overflows to a negative value, and returns a negative value, if the number of items being sorted is large enough. Alternatives that don’t overflow, and that a compiler might transform the code to, include: low + ((high - low) / 2) and (low + high) >>> 1.

the second involves summing a sequence of floating-point numbers. The typical implementation is a simple loop such as the following:

sum=0.0;
for i=1 to array_len
   sum += array_of_double[i];

which for large arrays can result in sum losing a great deal of accuracy. The Kahan summation algorithm tries to take account of accuracy lost in one iteration of the loop by compensating on the next iteration. If floating-point numbers were represented to infinite precision the following loop could be simplified to the one above:

sum=0.0;
c=0.0;
 for i = 1 to array_len
   {
   y = array_of_double[i] - c; // try to adjust for previous lost accuracy
   t = sum + y;
   c = (t - sum) - y; //  try and gets some information on lost accuracy
   sum = t;
   }

In this case the additional accuracy is bought at the price of a decrease in performance.

Compiler maintainers are just like other workers in that they want to carry on working at what they are doing. This means they need to keep finding ways of improving their product, or at least improving it from the point of view of those willing to pay for their services.

Many low level transformations such as the above two examples would be not be that hard to implement, and some developers would regard them as useful. In some cases the behavior of the code as written would be required, and its transformed behavior would be surprising to the author, while in other cases the transformed behavior is what the developer would prefer if they were aware of it. Doesn’t it make sense to perform the transformations in those cases where the as-written behavior is least likely to be wanted?

Compilers already do things that are surprising to developers (often because the developer does not fully understand the language, many of which continue to grow in complexity). Creating the potential for more surprises is not that big a deal in the overall scheme of things.

Categories: Uncategorized Tags: accuracy, algorithms, compiler, faster, faults, floating-point, kahan summation, surprises, the future, transformation

The Shape of Code

Archive

Automatically improving code

Recent Posts

Recent Comments

Archives

Meta