Home > Uncategorized > I made a mistake, please don’t shoot me

I made a mistake, please don’t shoot me

The major difference between commercial/academic written software is the handling of user mistakes, or to be more exact what is considered to be a user mistake. In the commercial world the emphasis is on keeping the customer happy, which translates into trying hard to gracefully handle any ‘mistake’ the user makes. Academic software is generally written to solve a research problem and is often very unforgiving of users failing to keep to the undocumented straight and narrow; given the context this unforgiving behavior is understandable, but sometimes such software is released to an unsuspecting world.

The R archive of contributed packages, CRAN, is a good example of the academic approach to writing software. I am an active user of many packages in this archive and its contributors have my heart-felt thanks. But on a regular basis I make a mistake when calling a function in one of these packages, get shot in the foot and am not best pleased.

What makes the situation worse is that my mistakes are often so trivial and easy to fix (by both me or the package authors). My most common ‘mistake’ is passing an argument whose type is not handled by the function, e.g., passing a data-frame to diag (why do I have to convert the argument using as.matrix, when diag could spot my mistake and do the conversion for me instead of returning some horrible mess).

Commercial software can also be unforgiving of user mistakes; in fact early versions of a lot of commercial software is just as unfriendly as academic software. The difference is that the commercial managers will make it their business to ensure that developers fix the code to make it user friendly. Competition ensures that those who don’t listen to their users go out of business.

Updating code to gracefully handle user mistakes is often a chore and many developers hate having to do it, managers are needed to prod developers into doing the work. The only purpose for more than half of the code in a commercial product may be to handle user mistakes and the percentage can approach 90%.

A lot of Open Source software has significant commercial backing, e.g., Linux, Apache, Firefox and gcc/llvm, which means it is somebody’s job to make sure customer complaints are addressed.

What the R development team needs is more commercial backing (it appears to have very little, but I may be wrong). Then somebody can be hired to go through the popular packages to make then mistake friendly, feed the changes back to the original author and generally educate package developers about bullet proofing their code.

Categories: Uncategorized Tags: , ,
  1. Harold
    July 31, 2013 22:31 | #1

    Okay I won’t shoot you, but speaking of mistakes… it’s “somebody’s” not “somebodies.” πŸ™‚

    I also suspect you meant to type “why do I have to” rather than “why to I have to.”

    If programming were as easy to correct as typos and grammar we’d have far fewer problems.

  2. July 31, 2013 22:45 | #2

    @Harold
    Thanks for the corrections. I would not say that grammar checking was easy, LanguageTool is probably one of the best tools around and it often fails to spot mistakes I make.

    There is something to be said for keeping programming hard so that prices stay high πŸ˜‰

  3. August 1, 2013 01:50 | #3

    I’d contend that commercial backing is a bad thing to CRAN (the already is commercial backing for R: http://www.revolutionanalytics.com/ This is great for some users but at the end of the day someone who controls the money controls the product. R and other open source projects are fueled by creativity and the needs of the field. I find it so much easier to get help from R package maintainers than to call SPSS or NVivo etc. The package is someone’s baby that they love and have invested time in and want to share. I’m sure people feel somewhat like that at SPSS but I didn’t get that feeling of passion in asking for help. The knitr and slidify packages are two cases where I’ve received terrific help from very busy maintainers.

    Secondly, I’d like to address a turning the tables on “I made a mistake, please don’t shoot me” as a package maintainer. I’ve had some great people contact me and I’ve also had some people contact me with arrogance with a demanding tone. Guess who gets my attention. Package maintainers have spent hours and hours developing a package (usually for their own use and are kind enough to share this work). It would behoove those who write maintainers with a less than chipper attitude to consider this.

  4. August 1, 2013 02:17 | #4

    @trinker
    The problem with volunteers is that they want to do what interests them and who can blame them. It is hard enough getting paid developers to bullet-proofing their own code, I don’t hold out much hope of getting volunteers to do it. We don’t want to put people off contributing packages covering their area of expertise by requiring them to invest a lot of time in what they are likely to consider drudgery outside their area of expertise.

    Commercial backing is a good way of ensuring that the important, but non-interesting, things get done.

  5. August 1, 2013 05:43 | #5

    Thanks for this article. It is just how I feel about the matter. We can see the influence of corporate sponsorship in the Development of Rstudio and Shiny, two of the best tools to come out related to R in a long time in my opinion in a long time. I think the R development team should think seriously about having R annual donation drives as with wikipedia to help establish a budget to improve the quality of the user experience.

    I love R and found it challenging though rewarding to learn.

  6. August 1, 2013 19:47 | #6

    Interesting comments. But look carefully at your example. as.matrix(), by definition, “attempts to turn its argument into a matrix”, and has a well structured set of rules for managing data frames with categorial and numeric data. You probably have well formatted, well understood data frames. You KNOW that as.matrix() and diag() will work for your data.

    I have found, in teaching R and writing a book about it, that the most important thing the process of having to write your code in R does is make you understand your data and the process of analysing it. It keeps newbies learning and experienced people super honest and clear.

    SO, for you, with a good understanding of the difference between a matrix and dataframe, using diag() might be made easier.

    But the person learning what diag() does, and why certain types of data might not be good to pass through to diag(), or finding a mistake in your dataframe that should be a matrix, benefits greatly from the requirement to convert the data first, and then use diag(). i.e. using diag() really is a two step process. Input and check your data is a matrix, THEN, “Extract or replace the diagonal of a matrix, or construct a diagonal matrix.”

    Sure, the authors could write a stop() argument. But then you have to go and figure out why it stopped in the first place…. One of R’s treasures, I would argue, is having to know what you want to do (at least a bit) before you can do it.

  7. August 2, 2013 00:27 | #7

    @Andrew Beckerman (@GSwithR)
    My main wish is that R functions should detect when their argument(s) do not have a type that the main body of the function can handle and do something about it. The situation at the moment is that some obscure gets error generated or some weird looking data gets returned. re are two schools.

    The two main do-something behaviors, after detecting the unhandled type, are either to generate an error saying something like “Argument N must have one of the types foo or bah”, or to attempt to convert the argument to a type that is handled (and flag an error if that cannot be done).

    When it works the argument conversion approach is great, but it can end up down a very complicated rabbit hole with the user wishing that the give-an-error and get out approach had be used.

    I am a great fan of strong type and am pleased you hear that you teach your students the benefits of using the appropriate type. I imagine this must be something of an uphill struggle in R, a language that does not go out of its way to enforce a distinction between types.

    I would argue that one of R’s treasures is that you don’t have to much about what you want or how to get it and it is possible to throw some code together that will work and plot will produce some eye-candy for you (plot is R’s unsung hero). No language ever succeeded by requiring its users to know what they are doing and making them think hard about the problem.

  1. No trackbacks yet.