Archive for December, 2015

R recommended usage for professional developers

December 29th, 2015 10 comments

R is not one of those languages where there is only one way of doing something, the language is blessed/cursed with lots of ways of doing the same thing.

Teaching R to professional developers is easy in the sense that their fluency with other languages will enable them to soak up this small language like a sponge, on the day they learn it. The problems will start a few days after they have been programming in another language and go back to using R; what they learned about R will have become entangled in their general language knowledge and they will be reduced to trial and error, to figure out how things work in R (a common problem I often have with languages I have not used in a while, is remembering whether the if-statement has a then keyword or not).

My Empirical software engineering book uses R and is aimed at professional developers; I have been trying to create a subset of R specifically for professional developers. The aims of this subset are:

  • behave like other languages the developer is likely to know,
  • not require knowing which way round the convention is in R, e.g., are 2-D arrays indexed in row-column or column-row order,
  • reduces the likelihood that developers will play with the language (there is a subset of developers who enjoy exploring the nooks and crannies of a language, creating completely unmaintainable code in the process).

I am running a workshop based on the book in a few weeks and plan to teach them R in 20 minutes (the library will take a somewhat longer).

Here are some of the constructs in my subset:

  • Use subset to extract rows meeting some condition. Indexing requires remembering to do it in row-column order and weird things happen when commas accidentally get omitted.
  • Always call read.csv with the argument Computers now have lots of memory and this factor nonsense needs to be banished to history.
  • Try not to use for loops. This will probably contain array/data.frame indexing, which provide ample opportunities for making mistakes, use the *apply or *ply functions (which have the added advantage of causing code to die quickly and horribly when a mistake is made, making it easier to track down problems).
  • Use head to remove the last N elements from an object, e.g., head(x, -1) returns x with the last element removed. Indexing with the length minus one is a disaster waiting to happen.

It’s a shame that R does not have any mechanism for declaring variables. Experience with other languages has shown that requiring variables to be declared before use catches lots of coding errors (this could be an optional feature so that those who want their ‘freedom’ can have it).

We now know that support for case-sensitive identifiers is a language design flaw, but many in my audience will not have used a language that behaves like this and I have no idea how to help them out.

There are languages in common use whose array bounds start at one. I will introduce R as a member of this club. Not much I can do to help out here, except the general suggestion not to do array indexing.

Suggestions based on reader’s experiences welcome.

Tags: , ,

2015: the year I started regularly talking to researchers in China

December 18th, 2015 No comments

For me 2015 is when I started having regular email discussions with researchers in China; previously I only had regular discussions with Chinese researchers in the other countries. Given the number of researchers in China the volume of discussion will likely increase.

I have found researchers in China to be as friendly and helpful as researchers in other countries. The main problem I have experienced is slow or intermittent access to websites based in China.

In the past Chinese researchers I met were very good and these was a consensus that Asians were very good academically. I was told that we in the west were seeing a very skewed sample. Well, recently I have started to meet Chinese researchers who are not that good (or perhaps not even very good at all, I did not have time to find out). Perhaps the flow of Chinese researchers in the west now exceeds the volume of available really good people, or perhaps more of the good people are staying in China.

I am looking forward to learning about the Chinese view of software engineering (whatever it might be). From what I can tell it is a very practical approach, which I am very pleased to see.

Tags: ,

Christmas books: 2015

December 13th, 2015 No comments

The following is a list of the really interesting books I read this year (only one of which was actually published in 2015, everything has to work its way through several piles and being available online is a shortcut to the front of the queue). The list is short because I did not read many books.

The best way to learn about something is to do it and The Language Construction Kit by Mark Rosenfelder ought to be required reading for all software developers. It is about creating human languages and provides a very practical introduction into how human languages are put together.

The Righteous Mind by Jonathan Haidt. Yet more ammunition for moving Descartes‘s writings on philosophy into the same category as astrology and flat Earth theories.

I’m still working my way through Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman and Jeff Ullman.

If you are thinking of learning R, then the best book (and the one I am recommending for a workshop I am running) is still: The R Book by Michael J. Crawley.

There are books piled next to my desk that might get mentioned next year.

I spend a lot more time reading blogs these days and Ben Thompson’s blog Stratechery is definitely my best find of the year.

Tags: ,

So you found a bug in my compiler: Whoopee do

December 7th, 2015 No comments

The hardest thing about writing a compiler is getting someone to pay you to do it. Having found somebody to pay you to write, update or maintain a compiler, why would you want to fix fault reported by some unrelated random Joe user?

If Random Joe is customer of a commercial compiler company he can expect a decent response to bug reports, because paying customers are hard to find and companies want to hold onto them.

If Random Joe is a user of an open source compiler, then what incentive does anybody paid to work on the compiler have to do anything about the reported problem?

The most obvious reason is reputation, developers want to feel that they are creating a high quality piece of software. Given that there are not enough resources to spend time investigating all reported problems (many are duplicates or minor issues), it is necessary to prioritize. When reputation is a major factor, the amount of publicity attached to a problem report has a big impact on the priority assigned to that report.

When compiler fuzzers started to attract a lot of attention, a few years ago, the teams working on gcc and llvm were quick to react and fix many of the reported bugs (Csmith is the fuzzer that led the way).

These days finding bugs in compilers using fuzzing is old news and I suspect that the teams working on gcc and llvm don’t need to bother too much about new academic papers claiming that the new XYZ technique or tool finds lots of compiler bugs. In fact I would suggest that compiler developers stop responding to researchers working toward publishing papers on these techniques/tools; responses from compiler maintainers is being becoming a metric for measuring the performance of techniques/tools, so responding just encourages the trolls.

Just because you have been using gcc or llvm since you wore short trousers does not mean they owe you anything. If you find a bug in the compiler and you care, then fix it or donate some money so others can make a living working on these compilers (I don’t have any commercial connection or interest in gcc or llvm). At the very least, don’t complain.

Hackathon New Year’s resolution

December 6th, 2015 No comments

The problem with being a regularly hackathon attendee is needing a continual stream of interesting ideas for something to build; the idea has to be sold to others (working in a team of one is not what hackathons are about), be capable of being implemented in 24 hours (if only in the flimsiest of forms) and make use of something from one of the sponsors (they are paying for the food and drink and it would be rude to ignore them).

I think the best approach for selecting something to build is to have the idea and mold one of the supplied data sets/APIs/sponsor interests to fit it; looking at what is provided and trying to come up with something to build is just too hard (every now and again an interesting data set or problem pops up, but this is not a regular occurrence).

My resolution for next year is to only work on Wow projects at hackathons. This means that I will not be going to as many hackathons (because I cannot think of a Wow idea to build) and will be returning early from many that I attend (because my Wow idea gets shot down {a frustratingly regular occurrence} and nobody else manages to sell me their idea, or the data/API turns out to be seriously deficient {I’m getting better at spotting the likelihood of this happening before attending}).