Home > Uncategorized > Unreliable cpus and memory: The end result of Moore’s law?

Unreliable cpus and memory: The end result of Moore’s law?

Where is the evolution of commodity cpu and memory chips going to take its customers? I think the answer is cheap and unreliable products (just like many household appliances are priced low and have a short expected lifetime).

We have had the manufacturer-customer win-win phase of Moore’s law and I think we are now entering the win-loose phase.

The reason chip manufacturers, such as Intel, invest so heavily on continually shrinking dies is the same reason all companies invest, they expect to get a good return on their investment. The cost of processing the wafer from which individual chips are cut is more or less constant, reducing the size of a chip enables more to fitted on the same wafer, giving more product to sell for more or less the same wafer processing cost.

The fact that dies with smaller feature sizes have reduce power consumption and can run at faster clock speeds (up until around 10 years ago) is a secondary benefit to manufacturers (it created a reason for customers to replace what they already owned with a newer product); chip manufacturers would still have gone down the die shrink path if these secondary benefits had not existed, but perhaps at a slower rate. Customers saw, or were marketed, this strinkage story as one of product improvement for their benefit rather than as one of unit cost reduction for Intel’s benefit (Intel is the end-customer facing company that pumped billions into marketing).

Until recently both manufacturer and customer have benefited from die shrinks through faster cpus/lower power consumption and lower unit cost.

A problem that was rarely encountered outside of science fiction a few decades ago is now regularly encountered by all owners of modern computers, cosmic rays (plus more local source of ‘rays’) altering the behavior of running programs (4 GB of RAM is likely to experience a single bit-flip once every 33 hours of operation). As die shrink continues this problem will get worse. Another problem with ever smaller transistors is their decreasing mean time to failure (very technical details); we have seen expected chip lifetimes drop from 10 years to 7 and now less and decreasing.

Decreasing chip lifetimes is actually good for the manufacturer, it creates a reason for customers to buy a new product. Buying a new computer every 2-3 years has been accepted practice for many years (because the new ones were much better). Are we, the customer, in danger of being led to continue with this ‘accepted practice’ (because computers reliability is poor)?

Surely it is to the customer’s advantage to not buy devices that contain chips with even smaller features? Is it only the manufacturer that will obtain a worthwhile benefit from future die shrinks?

  1. Tel
    December 15th, 2013 at 10:23 | #1

    The industry is already slightly ahead of you there. If you buy an intel i7 chip, it’s fast but officially it’s a desktop chip. If you buy Xeon, it’s a server ship. What is the difference? Many people ask this.

    Well the Xeon supports ECC memory which stores one extra bit per byte and assembles a Hamming code for bit error correction. The Xeon has the necessary circuitry enabled to reconstruct the Hamming code and thus survive the inevitable occasional bit flip that will occur.

    Your typical satellite TV operator uses LDPC which is a similar concept to a Hamming code and does the same thing, fixing up the inevitable noise that gets into the communications line.

  2. December 15th, 2013 at 12:12 | #2

    @Tel
    Yes, ECC memory is becoming necessary in a lot more applications. This is great for manufacturers who get to sell high end kit to solve problems present in the cheaper stuff. But isn’t the real solution to build chips that are less susceptible to bit flips (i.e., have larger die sizes)?

    Arithmetic coding of data created during program execution is another solution (talk about using this to make car engine management software more reliable)

  1. No trackbacks yet.

A question to answer *