My 2cents on the design of the contract language in Ethereum

January 12th, 2015 No comments

My previous post on Ethereum contracts got me thinking about how Ethereum should be going about creating a language and virtual machine for the contracts aspect of their cryptocurrency.

I would base the contract development gui on Scratch. Contract development will involve business people and having been extensively field tested on children Scratch stands some chance of being usable for business types.

For the language itself I would find a language already being used for contract related programming and simply adopt it.

At the moment the internal specification of contracts is visible for all to see. I expect a lot of people will be unwilling to share contract information with anybody outside of those they are dealing with. The Ethereum Virtual Machine needs to include opcodes that perform Homomorphic encryption, i.e., operations on encrypted values producing a result that is the encrypted version of the result from performing the same operation on the unencrypted values. Homomorphic encryption operations would allow writers of contracts that keep sensitive numeric values secret, they get to decide who gets a copy of thee dencryption keys needed to see the plain-text result.

The only way I see of overcoming the denial-of-service attack outlined in my previous post is to require that the miner who receives payment for executing a contract prove that they did the work of executing the contract by including program values existing at various randomly chosen points of contract execution.

Ethereum contracts, is the design viable?

January 11th, 2015 4 comments

I went to a workshop run by Ethereum today, to learn about cryptocurrency/virtual currency from a developer’s point of view. Ethereum can be thought of as Bitcoin+contracts; they are also doing various things to address some of the design problems experienced by Bitcoin. This article is about the contract component of Ethereum, which is being promoted as its unique selling point.

The Ethereum people talk so much about contacts that I thought once currency (known as ether, a finney is a thousandth of an ether, with other names going down to 10^{-18}) had been mined it was not considered legal tender until it was associated with the execution of a contract. In fact people will be free to mine as many ethers as they like without having any involvement with the contract side of things.

Ethereum expect the computational cost of mining to be significantly greater than the cost of executing contracts.

A contract is a self-contained piece of code created by a client and executed by a miner (the client pays the miner for this service). The result of executing the contract is added to the data associated with a block of mined currency (only one contract per block).

How confident can the client, and interested third parties, be that the miner paid to execute the contract is telling the truth and didn’t make up the data included in the mined currency block?

The solution adopted by Ethereum is to require all miners involved in the execution of contracts to execute all contracts that have been associated with mined currency, so that the result posted by the miner who got paid to execute each contract can be validated (51% agreement is required).

For this design to work, the cost of executing contracts to verify the data produced by other miners has to be small relative to paid income received from executing contracts.

If somebody is in the ether mining business, executing contracts has the potential to provide additional income for a small increase in effort, i.e., the currency has been mined why not get paid to execute contracts, adding the resulting data to blocks that have been mined?

Ethereum has specified the relative cost of operations executed by the Ethereum Virtual Machine. The Client specifies a value used to multiply these costs to produce an actual cost per operation that are willing to pay.

Requiring the community of contract executors to execute all contracts creates a possible vulnerability that can be exploited.

To profit from executing contracts the following condition must hold:

N_c A_o E_c < A_o P_o

where: N_c is the ratio of unpaid to paid contracts, A_o is the average number of operations performed per contract, E_c is the average execution cost per operation and P_o is the average amount clients pay per operation.

This simplifies to:

N_c {E_c/P_o} < 1

What would be a reasonable estimate for N_c, the ratio of unpaid to paid contracts? Would 1,000 miners offering contract execution services be a reasonable number? If we assume that contracts are evenly distributed among everybody involved in contract execution, then there are disincentives of scale; every new market entrant reduces income and increases costs for existing members.

What if businessman Bob wants to corner the contract market and decides to drive up the cost of executing contracts for all miners except himself? To do this he enlists the help of accomplice Alice as a client offering contracts at extremely poorly paid rates, which Bob is happy to accept and be paid for appearing to execute them (Alice has told him the result of executing the contracts); these contracts are expensive to execute, but Bob and Alice know number theory and don’t need a computer to figure out the results.

So Bob, Alice and all their university chums flood the Ethereum currency world with expensive to compute contracts. What conditions need to hold for them to continue profiting from executing contracts?

Lets assume that expensive contracts dominate, then the profitability condition becomes:

M_n N_c M_a A_o M_e E_c < A_o P_o

where: M_e is the increase in the number of contracts, M_a the increase in the average number of operations performed per contract and M_e the increase in average execution cost per operation.

This simplifies to:

M_n M_a M_e N_c {E_c/P_o} < 1

What values do we assign to the three multipliers: M_n M_a M_e? Lets say M_n=5,  M_a=10, M_e=3, which tells us that the price paid has to increase by a factor of 150 for those involved in the market to maintain the same profit level.

Given that Ethereum are making such a big fuss about contracts I had expected the language being used to express contracts to be tailored to that task. No such luck. Serpent is superficially Python-like, with the latest release moving in a Perl-ish direction, not in themselves a problem. The problem is that the languages is not business oriented, let alone contract oriented, and is really just a collection of features bolted together (the self absorbed use case for the new float type: “… elliptic curve signature pubkey recovery code…”, says it all). Another Ethereum language Mutan is claimed to be C-like; well, it does use curly brackets rather than indentation to denote scope.

If Ethereum does fly, then there is an opportunity for somebody to add-on a domain specific language for contracts, one that has the kind of built in checks that anybody involved with contracting will want to use t prevent expensive mistakes being made.

Users of Open source software are either the product or are irrelevant

January 9th, 2015 1 comment

Users of software often believe that the people who produced it are interested in the views and opinions of their users.

In a commercial environment there are producers who are interested in what users have to say, because of the financial incentive, and some Open source software exists within this framework, e.g., companies selling support services for the code they have Open sourced.

Without a commercial framework why would any producer of Open source be interested in the views and opinions of users? Answers include:

  • users are the product: I have met developers who are strongly motivated by the pleasure they get from being at the center of a community, i.e., the community of people using the software that they are actively involved in creating,
  • users are irrelevant: other developers are more strongly motivated by the pleasure of building things and in particular building using what are considered to be the right tools and techniques, i.e., the good-enough approach is not considered to be good-enough.

How do these Open source dynamics play out in the evolution of software systems?

It is the business types who are user oriented or rather source of money oriented. They are the people who stop developers from upsetting customers by introducing incompatibilities with previous releases, in fact a lot of the time its the business types pushing developers to concentrate on fixing problems and not wasting time doing more interesting stuff.

Keeping existing customers happy often means that products change slowly and gradually ossify (unless the company involved has a dominant market position that forces users to take what they are given).

A project primarily fueled its developers’ desire for pleasure cannot stand still, it has to change. Users who complain that change has an impact on their immediate need for compatibility get told that today’s pain is necessary for a brighter future tomorrow.

An example of users as the product is gcc. Companies pay people to work on gcc because they want access to the huge amount of software written by ‘the product’ using gcc.

As example of users as irrelevant is llvm. Apple is paying for llvm to happen and what does this have to do with anybody else using it?

The users of Angular.js now know what camp they are in.

As Bob Dylan pointed out: “Just because you like my stuff doesn’t mean I owe you anything.”

Z and Zermelo–Fraenkel

December 25th, 2014 No comments

Z is for Z and Zermelo–Fraenkel.

Z, pronounced zed, or Z notation to give it its full name, replaced VDM as the fashionable formal specification language before itself falling out of fashion. Gallina is in fashion today, but you are more likely to have heard of the application software built around it, Coq.

Zermelo–Fraenkel set theory with the axiom of choice is the brand name assembly language of mathematics (there are a number of less well known alternatives). Your author is not aware of any large program written in this assembly language (Principia Mathematica used symbolic logic and over 300 pages to prove that 1+1=2).

While many mathematical theorems can be derived from the axioms contained in Zermelo–Fraenkel, mathematicians consider some to be a lot more interesting than others (just like developers consider some programs to be more interesting than others). There have been various attempts to automate the process of discovering ‘interesting’ theorems, with Eurisko being the most famous and controversial example.

Things to read

Automated Reasoning: Introduction and Applications by Larry Wos, Ross Overbeek, Rusty Lusk, and Jim Boyle. A gentle introduction to automated reasoning based on Otter.

… and it is done.

Readers may have noticed the sparse coverage of languages created between the mid-1990s and 2010. This was the programming language Dark ages, when everybody and his laptop was busy reinventing existing languages at Internet speed; your author willfully tried to ignore new languages during this time. The focus for reinvention has now shifted to web frameworks and interesting new ideas are easier to spot amongst the noise.

Merry Christmas and a happy New Year to you all.


XSLT, yacc and Yorick

December 24th, 2014 No comments

X and Y are for XSLT, yacc and Yorick.

XSLT is the tree climbing Kangaroo of the programming language world. Eating your own dog food is good practice for implementors, but users should not be forced to endure it. Anyway, people only use XML, rather than JSON, to increase the size of their files so they can claim to be working with big data.

yacc grammars don’t qualify as programming languages because they are not Turing complete (because yacc requires its input to be a deterministic context-free grammar, a subset of context-free grammars, that can be implemented using a deterministic pushdown automaton; restructering a context-free grammar into a deterministic context-free grammar is the part of using yacc that mangles the mind the most). While Bison supports GLR parsing, this only extends its input to all context-free grammars; the same notation is used to specify grammars and this does not support the specification of languages such as: L = a^n b^n c^n, where n ge 1 and a^n denotes a repeated n times, which requires a context-free grammar. So the yacc grammar notation is not capable of expressing all computations.

Yorick is designed for processing large amounts of scientific data and is a lot like R, with slightly different syntax. Use of Yorick seems to have died out in the last few years. What happened, did its users discover R? I would certainly recommend a switch to R, but I would also suggest Yorick as a subject of study for the maintainers of R; there are language ideas to be mulled over, e.g., global("blah") to declare blah as being a global variable within the function containing the call.

Things to read

The Theory of Parsing, Translation, and Compiling, Vol. 1, Parsing by Alfred V. Aho and Jeffrey D. Ullman. Warning, reading this book will make your head hurt.

Introduction to Automata Theory, Languages, and Computation by John Hopcroft and Jeffrey D. Ullman. Has a great chapter on Turing machines. If you really want your head to hurt, then get the book above rather than the first edition of this book.


Wolfram and Whitespace

December 23rd, 2014 No comments

W is for Wolfram and Whitespace.

Wolfram, or the Wolfram language as the marketing people would want me to call it, is really Mathematica rebranded with its master‘s name.

Mathematica Wolfram (the language) is currently the dominant player in its niche market, where is its future? In the short term its future is in the hands of Wolfram (the person, company information suggests family ownership), with possible short term decisions and their long term consequences including:

  • keeping its sole implementation proprietary: in this case its fate depends on the commercial success of Wolfram Research. The main competition appears to be various Open source systems (e.g., Sage) and while these might not be as good, history shows that free software only needs to be good-enough to soak up a market share that makes the sale of commercial offerings uneconomic. How long before the drip-drip improvement of Open source makes it good-enough? Five, ten years?
  • selling to a larger company: Wolfram strikes me as a lifestyle business, so I don’t see this happening. How much might the company be worth? Wolfram Research Europe Ltd was valued at £3,138,002 in its 2013 financial filing and the Delaware incorporation had an annual tax assessment of $1,575.00 in 2013. No doubt other numbers are available and somebody who knows about these things could estimate a ballpark figure, but I am not that person,
  • releasing the code as Open source: if done now Wolfram (top dog that it is in quality and market share) would be the language for doing computer mathematics for ever more. However, few people have been willing to let go to this extent unless backed into a corner. Perhaps there is a subset that can be made Open source, enough to mow the lawn of competing systems but not so much that corporate customers will stop paying for support and what they believe are useful add-ons.

Whitespace is an invisible language (its written form is based on the characters space, tab and linefeed). The execution of this invisible language obviously requires a matching invisible virtual machine such as sed.

Programs and the means to execute them are all around us.


VHDL, Visual-Basic and VDM

December 22nd, 2014 No comments

V is for VHDL, Visual-Basic and VDM.

VHDL and Verilog are the leading hardware description languages. My only real hands on experience with digital hardware is soldering together my first computer (buying it in kit+component form saved me a month’s salary), but given the amount of dataflow programming involved I appreciate that programming hardware is likely to be very hard (but not as hard as programming Quantum computers).

The problem with general purpose cpus is that they never seem to have just the right instructions needed to generate really fast compact code for a particular application. The ideal solution is for the compiler to automatically figure out what instructions are the best fit for a given application and then to include the appropriate processor specification in the generated executable (program initialization starts with configuring the host cpu with the appropriate architecture+instructions to execute the generated code). Hardware is available to make this a reality, but the necessary software support is still being worked on (there is also the tiresome economic issue of mass produced cpus almost always being cheaper). At the moment application specific processors tend to be built into application specific computers.

Anyway, real programmers don’t write programs that make LEDs flash on a Raspberry Pi, real programmers use single-board computers that support personalization of the instruction set. If this future has arrived for you, please let me know how I can be included in the distribution.

Visual-Basic shares half of its name and use of the keyword Dim with BASIC. It was clever marketing to give a new language a name that contained the name of a language with a reputation for being easy to learn and use. I bet the developers involved hated this association.

VDM was for many years THE formal method; complete definitions of industrial strength languages (e.g., PL/1, Modula-2 and CHILL) were written using it. These days it seems to be supporting a small community by helping people model industrial systems, however it has fallen out of fashion in the programming language world.

Back in the day there was lots of talk of tools using the available VDM language specifications to check source code (my interest was in checking the consistency of the specifications). In practice tooling did not get much further than typesetting the printed VDM.

Back when I thought that formal methods had a future, I was more of a fan of denotational semantics than VDM. The CHILL VDM specification has at least one reported fault (I cannot tell you where it is because I cannot find my copy); mine was the first reported and I never heard if anybody else ever reported a fault.

Things to read

Computer Architecture: A Quantitative Approach (second edition) by John L. Hennessy and David A. Patterson. Later editions become more narrowly focused on current hardware.

The Denotational Description of Programming Languages by Michael J. C. Gordon is a good short introduction, while Denotational Semantics by David A. Schmidt is longer.



December 21st, 2014 No comments

U is for UNCOL and UML.

UNCOL is the compiler intermediate language that gets named when people need a historical starting point to refer to. This kind of intermediate language is one that contains all the information needed for a compiler front-end to communicate a representation of the original source code to an independent compiler back-end that generates machine code (the intermediate languages used for internal communication within a compiler are often just tree data structures that also make use of information held in other data structures, i.e., they are not points where a clean cut can be made to insert independent, self contained, modules). The rationale for a common intermediate language is saving time and money by reducing the number of compilers that need to be created from N_L * N_M to N_L + N_M, where N_L is the number of languages and N_M the number of target machines.

Generating high quality code requires having access to as much information as possible about the original source, which means the intermediate language split-point needs to occur as soon after source language specific processing as possible (otherwise all kinds of subtle target machine architecture assumptions get embedded in the structure of the intermediate code, which can make life difficult when the target makes different assumptions). ANDF is one of the few implementations that makes the split-point early (potential customers were not happy that it would be relatively easy to reverse engineer the original source).

Traditionally many more back-end got written than front-ends and the decision on where to make the intermediate language split-point was driven by the desire to reduce the time and effort needed to write a back-end, i.e., it was as close to actually pumping out machine code as possible. The quality of the generated code is not as good as it might be, but a new compiler can be shipped much quicker and more cheaply. Almost every retargetable compiler has made this design choice.

These days with more new languages appearing than new processors there would appear to be advantages in shifting the intermediate language split-point closer to the front-end. In practice compiler vendors get most of their income from back-end work (e.g., vendors wanting support for their new processor) and income from new front-ends is very rare, i.e., there is no incentive to change the status-quo.

UML is not regarded as a serious language by many developers. Is this because programs look like pretty pictures, something that managers might use, or UML being used for very non-developer problems like business processes? I don’t know. UML seems to be growing in popularity and have an increasing number of support tools. Perhaps one day I will use it for real and be able to say something of interest.

Things to read

A Code Generation Interface for ANSI C by C. W. Fraser and D. R. Hanson.

Code selection through object code optimization by J. W. Davidson and C. W. Fraser.


TECO, Tcl/Tkl and TeX

December 20th, 2014 1 comment

T is for TECO, Tcl/Tkl and TeX.

TECO was present during creation and begat Emacs. Despite living on TECO’s home turf, as an undergraduate, my lack of beard and disdain for Dungeons & Dragons stopped me becoming a member of its inner sanctum.

Tcl/Tkl is practical combination of scripting language+GUI toolkit for adding a usable (if sometimes somewhat fragile) GUI wrapper around one or more command line programs. Tcl is one of those languages that gets the job done without anyone making a big fuss about it.

TeX is Donald Knuth’s original typesetting system, however the language built on top of it, LaTeX, is probably more well known these days. While creating TeX Knuth proposed an approach to combing code and documentation together and gave it the catchy name literate programming; unfortunately this approach was not strangled at birth and because of Knuth’s reputation and the catchy name it keeps regrowing heads that need to be cut-off. The literate programming approach requires code, comments and formatting instructions be jumbled together in a horrendous mess, which is then feed through a process that can produce visually beautiful output (if enough effort is put into inserting the appropriate incantations into the horrendous mess). When writing something that will primarily be read at the beautiful output stage it is worth investing in the horrendous mess stage, but most code is used in its native, horrendous mess, form. Knuth has the mathematician’s habit of using single letter identifiers and love of ‘neat’ tricks, he would be a disaster on any software engineering project where he could not be left alone in a room producing code that nobody ever needs to touch.


Snobol 4, Simula 67, Smalltalk, sed, SQL, SETL, Scratch and Spreadsheet

December 19th, 2014 No comments

S is for Snobol 4, Simula 67, Smalltalk, sed, SQL, SETL, Scratch and Spreadsheet.

Snobol 4 was the go to language for text generation and pattern matching before PERL/Python became widespread in the late 1990s (if your PERL/Python code is too slow try Snobol, compilers generating machine code have been written for it since the 1970s). It is a boutique language that entrances many people who encounter the Green book (your author fell under its spell as an undergraduate).

Simula 67 was the first object oriented language. However, 30 years went by before object-oriented programming could be said to have entered general use (some people would claim that this has still not happened). Yes, lots of developers were using C++ compilers in the late 1980s, but only because they wanted to look trendy and be able to list the language on their CV. During the late 1980s and early 1990s I was heavily involved with the static analysis of C using an in-house tool and found that a lot of C++ source could be handled by adding a few extensions to the tool, such as handling new (rather than malloc) and allowing class to be used where struct should have appeared. It was not until the mid-90s that parsing C++ source with an extended C parser became problematic on a regular basis (and then often only because of the use of iostreams).

Back in the day it was said that only Scandinavians understood OO programming. I think the real reason was that only Scandinavians understood the English used in the book written by the language designers: “Simula BEGIN” by Birtwistle, Dahl, Myrhaug and Nygaard (there is a rave review on Amazon, I wonder if the reviewer is Scandinavian).

Smalltalk is invariably the language of choice for object-oriented purists. I know of several development groups in industry that choose to use Smalltalk because they wanted to really learn how to do things the OO ‘way’ (Smalltalk is all objects, its users cannot do anything in a non-OO way); I don’t know what those subsequently responsible for maintaining of the code thought about this rationale.

The ‘launch’ of Smalltalk in the UK by Goldberg and crew happened before Apple’s Macintosh made Xerox PARC famous, and the talk was all about message passing being the future and everybody being able to modify any behavior of their operating environment (we were shown Smalltalk’s flexibility by a speaker who changed the definition of an operation that defined how the borders of windows were displayed; they agreed that this was potentially dangerous but that users soon learned by their mistakes). The launch of the Macintosh changed the whole PARC narrative.

Thinking back, the only thing that stopped the Smalltalk team inventing Open Source was the fact they thought it so obvious to ship the source of everything that it was not worth mentioning.

sed gets a mention as the most unlikely language used to implement a Turing machine.

SQL is invisible to developer who don’t use it because publishing SQL source is generally pointless unless the appropriate data is also available. There is a lot of SQL lurking (as text strings) within the source of programs written in other languages. If you want to see some SQL then check out the Javascript embedded in a lot of the web pages you read.

SETL is a Cambrian explosion language which has had a small, but influential, user base.

Scratch suffers from being thought of as a language for children. If touch screens had been around when people first started inventing programming languages I’m sure things would be very different today, where most developers think that real programs are written in lines of text.

Spreadsheet languages have for a long time been a neglected topic from the perspective of tools to find faults and programming language research. Things are starting to change in industry and hopefully the work of keen academics will be appreciated in academia.

Things to read

The SNOBOL4 Programming Language by Ralph E. Griswold, J. F. Poage, and I. P. Polonsky.

Every language has its blind spot, the techniques that should never be written using that language. In the case of SQL it is tree data structures: “Joe Celko’s Trees and Hierarchies in SQL for Smarties” by Joe Celko.