Home > Uncategorized > Exhuastive superoptimization

Exhuastive superoptimization

Optimizing compilers include a list of common code patterns and equivalent, faster executing, patterns that they can be mapped to (traditionally the role of peephole optimization, which in many early compilers was the only optimization).

The problem with this human written table approach is that it is fixed in advance and depends on the person adding the entries knowing lots of very obscure stuff.

An alternative approach is the use of a superoptimizer, these take a code sequence and find the functionally equivalent sequence that executes the fastest. However, superoptimizing interesting sequences often take a very long time and so use tends to be restricted to helping figure out, during compiler or library development, what to do for sequences that are known to occur a lot and be time critical.

A new paper, Optgen: A Generator for Local Optimizations, does not restrict itself to sequences deemed to be common or interesting, it finds the optimal sequence for all sequences by iterating through all permutations of operands and operators. Obviously this will take awhile and the results in the paper cover expressions containing two unknown operands + known constants, and expressions containing three and four operands that are not constants; the operators searched were: unary `~` and `-` and binary `+`, `&`, `|`, `-` and `~`.

The following 63 optimization patterns involving two variables (`x` and `y`)+constants (`c0`, `c1`, `c2`) were found, with the right column listing whether LLVM, GCC or the Intel compiler supported the optimization or not.

``` optional-precondition => before -> after L G I -~x -> x + 1 + + - -(x & 0x80000000) -> x & 0x80000000 - + - ~-x -> x - 1 + + - x +~x -> 0xFFFFFFFF + + - x + (x & 0x80000000) -> x & 0x7FFFFFFF - - - (x | 0x80000000) + 0x80000000 -> x & 0x7FFFFFFF + - - (x & 0x7FFFFFFF) + (x & 0x7FFFFFFF) -> x + x + + - (x & 0x80000000) + (x & 0x80000000) -> 0 + + - (x | 0x7FFFFFFF) + (x | 0x7FFFFFFF) -> 0xFFFFFFFE + + - (x | 0x80000000) + (x | 0x80000000) -> x + x + + - x & (x + 0x80000000) -> x & 0x7FFFFFFF + - - x & (x | y) -> x + + - x & (0x7FFFFFFF - x) -> x & 0x80000000 - - - -x & 1 -> x & 1 - + - (x + x)& 1 -> 0 + + - is_power_of_2(c1) && c0 & (2 * c1 - 1) == c1 - 1 => (c0 - x) & c1 -> x & c1 - - - x | (x + 0x80000000) -> x | 0x80000000 + - - x | (x & y) -> x + + - x | (0x7FFFFFFF - x) -> x | 0x7FFFFFFF - - - x | (x^y) -> x | y + - - ((c0 | -c0) &~c1) == 0 ) (x + c0) | c1!x | c1 + - + is_power_of_2(~c1) && c0 & (2 *~c1 - 1) ==~c1 - 1 => (c0 - x) | c1 -> x | c1 - - - -x | 0xFFFFFFFE -> x | 0xFFFFFFFE - - - (x + x) | 0xFFFFFFFE -> 0xFFFFFFFE + + - 0 - (x & 0x80000000) -> x & 0x80000000 - + - 0x7FFFFFFF - (x & 0x80000000) -> x | 0x7FFFFFFF - - - 0x7FFFFFFF - (x | 0x7FFFFFFF) -> x & 0x80000000 - - - 0xFFFFFFFE - (x | 0x7FFFFFFF) -> x | 0x7FFFFFFF - - - (x & 0x7FFFFFFF) - x -> x & 0x80000000 - - - x^(x + 0x80000000) -> 0x80000000 + - - x^(0x7FFFFFFF - x) -> 0x7FFFFFFF - - - (x + 0x7FFFFFFF)^0x7FFFFFFF -> -x - - - (x + 0x80000000)^0x7FFFFFFF -> ~x + + - -x^0x80000000 -> 0x80000000 - x - - - (0x7FFFFFFF - x)^0x80000000 -> ~x - + - (0x80000000 - x)^0x80000000 -> -x - + - (x + 0xFFFFFFFF)^0xFFFFFFFF -> -x + + - (x + 0x80000000)^0x80000000 -> x + + - (0x7FFFFFFF - x)^0x7FFFFFFF -> x - - - x - (x & c) -> x &~c + + - x^(x & c) -> x &~c + + - ~x + c -> (c - 1) - x + + - ~(x + c) -> ~c - x + - - -(x + c) -> -c - x + + - c -~x -> x + (c + 1) + + - ~x^c -> x^~c + + - ~x - c -> ~c - x + + - -x^0x7FFFFFFF -> x + 0x7FFFFFFF - - - -x^0xFFFFFFFF -> x - 1 + + - x & (x^c) -> x &~c + + - -x - c -> -c - x + + - (x | c) - c -> x &~c - - - (x | c)^c -> x &~c + + - ~(c - x) -> x +~c + - - ~(x^c) -> x^~c + + - ~c0 == c1 => (x & c0)^c1 -> x | c1 + + - -c0 == c1 => (x | c0) + c1 -> x &~c1 - - - (x^c) + 0x80000000 -> x^(c + 0x80000000) + + - ((c0 | -c0) & c1) == 0 => (x^c0) & c1!x & c1 + + - (c0 &~c1) == 0 => (x^c0) | c1!x | c1 + - - (x^c) - 0x80000000 -> x^(c + 0x80000000) + + - 0x7FFFFFFF - (x^c) -> x^(0x7FFFFFFF - c) - - - 0xFFFFFFFF - (x^c) -> x^(0xFFFFFFFF - c) + + -```

The optional-precondition are hand-written rules that constants must obey for a particular expression mapping to apply, e.g., `(c0 &~c1) == 0`.

Hand written rules using bit-vector (for each bit of a constant the value one, zero or don’t know) analysis of constants is used to prune the search space (otherwise the search of the two operand+constants case would still be running).

LLVM covers 40 of the 63 cases, GCC 36 and Intel just 1 (wow). I suspect that many of the unsupported patterns are rare, but now they are known it is worth implementing them just to be able to say that this particular set of possibilities is completely covered.