Home > Uncategorized > Expected variability in a program’s SLOC

Expected variability in a program’s SLOC

If 10 people independently implement the same specification in the same language, how much variation will there be in the length of their programs (measured in lines of code)?

The data I have suggests that the standard deviation of program length is one quarter of the mean length, e.g., 10k mean length, 2.5k standard deviation.

The plot below (code+data) shows six points from the samples I have. The point in the bottom left is based on 6,300 C programs from a programming contest question requiring solutions to the 3n+1 problem and one of the points on the right comes from five Pascal compilers for the same processor.

Mean vs standard deviation of sample program SLOC

Multiple implementations of the same specification, in the same language, are very rare. If you know of any, please let me know.

  1. Dean Giberson
    October 3rd, 2017 at 00:24 | #1

    https://programmingpraxis.com/contents/revchron/ is a site that posts daily “programming problems”, while not full programs several answers submitted by contributors are in the same language.

    The Project Euler may have more data for you to pull from, https://projecteuler.net/about

  2. October 3rd, 2017 at 01:33 | #2

    @Dean Giberson
    Thanks for the suggestion.

    The programming contest sites I have looked at required some effort to scrape their contents (I have not looked at programming praxis). I decided against the project Euler code because the problems were very targeted at a specific maths/algorithmic problem which is not realistic (yes, the 3n+1 problem has this issue)

    I found one study, by Back and Westman, that has scraped the code from a programming contest site and made it available for download. I have a copy, but have not done anything with it yet.

    Other sources of programs (all requiring some work) include: clones of various editors, implementations of basic communication protocols and compress programs. The issue with this data is possibility of partial implementation and extensions (the later is an issue with the Pascal compiler data).

  1. No trackbacks yet.

A question to answer *