Home > Uncategorized > Practical statistics books for software engineers

Practical statistics books for software engineers

So you have read my (draft) book on evidence-based software engineering and want to learn more about the statistical techniques used, but are not interested lots of detailed mathematics. What books do I suggest?

All the following books are sitting on the shelf next to where I write (not that they get read that much these days).

Before I took the training wheels off my R usage, my general go to book was (I still look at it from time to time): “The R Book” by Crawley, second edition; “R in Action” by Kabacoff is a good general read.

In alphabetical subject order:

Categorical data: “Categorical Data Analysis” by Agresti, the third edition is a weighty tomb (in content and heaviness). Plenty of maths+example; more of a reference.

Compositional data: “Analyzing compositional data with R” by van den Boogaart and Tolosana-Delgado, is more or less the only book of its kind. Thankfully, it is quite good.

Count data: “Modeling count data” by Hilbe, may be more than you want to know about count data. Readable.

Circular data: “Circular statistics in R” by Pewsey, Neuhauser and Ruxton, is the only non-pure theory book available. The material seems to be there, but is brief.

Experiments: “Design and analysis of experiments” by Montgomery.

General: “Applied linear statistical models” by Kutner, Nachtsheim, Neter and Li, covers a wide range of topics (including experiments) using a basic level of mathematics.

Machine learning: “An Introduction to Statistical Learning: with Applications in R” by James, Witten, Hastie and Tibshirani, is more practical (but not dumbed down, like some) and less maths (a common problem with machine learning books, e.g., “The Elements of Statistical Learning”). Watch out for the snake-oil salesmen using machine learning.

Mixed-effects models: “Mixed-effects models in S and S-plus” by Pinheiro and Bates, is probably the book I prefer; “Mixed effects models and extensions in ecology with R” by Zuur, Ieno, Walker, Saveliev and Smith, is another view on an involved topic (plus lots of ecological examples).

Modeling: “Statistical rethinking” by McElreath, is full of interesting modeling ideas, using R and Stan. I wish I had some data to try out some of these ideas.

Regression analysis: “Applied Regression Analysis and Generalized Linear Models” by Fox, now in its third edition (I also have the second edition). I found this the most useful book, of those available, for a more detailed discussion of regression analysis. Some people like “Regression modeling strategies” by Harrell, but this does not appeal to me.

Survival analysis: “Introducing survival and event history analysis” by Mills, is a readable introduction covering everything; “Survival analysis” by Kleinbaum and Klein, is full of insights but more of a book to dip into.

Time series: The two ok books are: “Time series analysis and its application: with R examples” by Shumway and Stoffler, contains more theory, while “Time series analysis: with applications in R” by Cryer and Chan, contains more R code.

There are lots of other R/statistics books on my shelves (just found out I have 31 of Springer’s R books), some ok, some not so. I have a few ‘programming in R’ style books; if you are a software developer, R the language is trivial to learn (its library is another matter).

Suggestions for books covering topics I have missed welcome, or your own preferences (as a software developer).

Categories: Uncategorized Tags: , ,
  1. November 8, 2018 12:41 | #1

    I’d nominate, in place of Kabacoff, “The Book of R” (yes, I still think the title is intended to conflate with Crawley) by Davies. It is written in a conversational voice, which isn’t always possible to pull off.

  2. November 8, 2018 13:14 | #2

    @Robert Young
    I don’t have that book. It’s probably a timing issue, by 2016 (when “The Book of R” was published) I was not looking for a general, basic book. Reading the excerpts on Amazon, it looks like a very gentle introduction.

    Matloff’s “The art of R”, in the same publisher’ series, is something of a gentle introduction (which I found disappointing, but this is probably a bigger market).

  3. Jannes
    November 9, 2018 06:44 | #3

    I would add “An introduction to statistical learning” by James et al. to your list as an introduction to (semi-)parametric and machine learning algorithms in R. IMO the book is very well written with lots of good examples and nice visualizations. In addition, the authors put a lot of effort into making the book appealing to people who are not statisticians but still want to use the presented algorithms in an informed way.

  4. November 9, 2018 14:17 | #4

    @Jannes
    Thanks, for reminding me about this book. I have the pdf for “An introduction to statistical learning”, not the dead tree version (which is why I was not reminded of it when scanning my shelves); but I do have the dead tree version of “The Elements of Statistical Learning” with which it shares two authors (this is all maths).

    You need lots of data for machine learning techniques to work, and this is sometimes available. There are lots of nonsensical uses of machine learning in software engineering, and a few potentially useful ones.

  5. Pete Gordon
    November 9, 2018 16:56 | #5

    Thanks for sharing! I was just curious if you’ve looked into the IEEE Software Engineering Body of Knowledge? I just glanced at your Evidence-Based Software Engineering draft and some of it reminded me of the IEEE SWEBOK.

    https://en.wikipedia.org/wiki/Software_Engineering_Body_of_Knowledge

    https://www.shape-of-code.computer.org/discover-education/education-bodies-of-knowledge/

    Best regards!
    Pete Gordon

  6. November 9, 2018 17:28 | #6

    @Pete Gordon
    SWEBOK essentially contains extended definitions of terms. It is not a “body of knowledge”, more a, body of things we have descriptions for.

  7. Nedu Orekie
    November 24, 2018 22:05 | #7

    First off, thanks for your recommended list. Some I am aware of, others I’ll go hunting for.

    I would also recommend “Applied Predictive Modeling” by Max Kuhn and Kjell Johnson; that one deserves mention too. It’s as insightful as it is accessible. And it’s on par with “Introduction to Statistical Learning” when it comes to content.

    Regards,

    Nedu

  8. November 27, 2018 14:36 | #8

    @Nedu Orekie
    I have Kuhn and Johnson’s book. It is a practical introduction to a machine learning approach to data analysis, and it contains a lot less theory than most such books. I am a fan of using domain knowledge to build models, and so try and stay away from machine learning (which does have its uses).

  1. No trackbacks yet.