Home > Uncategorized > Modular vs. monolithic programs: a big performance difference

Modular vs. monolithic programs: a big performance difference

For a long time now I have been telling people that no experiment has found a situation where the treatment (e.g., use of a technique or tool) produces a performance difference that is larger than the performance difference between the subjects.

The usual results are that differences between people is the source of the largest performance difference, successive runs are the next largest (i.e., people get better with practice), and the smallest performance difference occurs between using/not using the technique or tool.

This is rather disheartening news.

While rummaging through a pile of books I had not looked at in many years, I (re)discovered the paper “An empirical study of the effects of modularity on program modifiability” by Korson and Vaishnavi, in “Empirical Studies of Programmers” (the first one in the series). It’s based on Korson’s 1988 PhD thesis, with the same title.

There were four experiments, involving seven people from industry and nine students, each involving modifying a 900(ish)-line program in some way. There were two versions of each program, they differed in that one was written in a modular form, while the other was monolithic. Subjects were permuted between various combinations of program version/problem, but all problems were solved in the same order.

The performance data (time to complete the task) was published in the paper, so I fitted various regressions models to it (code+data). There is enough information in the data to separate out the effects of modular/monolithic, kind of problem and subject differences. Because all subjects solved problems in the same order, it is not possible to extract the impact of learning on performance.

The modular/monolithic performance difference was around twice as large as the difference between subjects (removing two very poorly performing subjects reduces the difference to 1.5). I’m going to have to change my slides.

Would the performance difference have been so large if all the subjects had been experienced developers? There is not a lot of well written modular code out there, and so experienced developers get lots of practice with spaghetti code. But, even if the performance difference is of the same order as the difference between developers, that is still a very worthwhile difference.

Now there are lots of ways to write a program in modular form, and we don’t know what kind of job Korson did in creating, or locating, his modular programs.

There are also lots of ways of writing a monolithic program, some of them might be easy to modify, others a tangled mess. Were these programs intentionally written as spaghetti code, or was some effort put into making them easy to modify?

The good news from the Korson study is that there appears to be a technique that delivers larger performance improvements than the difference between people (replication needed). We can quibble over how modular a modular program needs to be, and how spaghetti-like a monolithic program has to be.

  1. March 2, 2019 03:20 | #1

    In 1994, Daly, et al, replicated the Korson’s experiment. They found that: “The results of our replication, however, were strikingly different from those of the original and showed no significant difference between the average times taken to maintain modular and monolithic code.”

    REFERENCE

    Daly, J., A. Brooks, J. Miller, M. Roper, and M. Wood. “An external replication of a korson experiment.” RR/162/94 [EFoCS-4-94]. Empirical Foundations of Computer Science, Tech. Rep (1994). Available online at URL: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.56.513&rep=rep1&type=pdf – accessed [2019-03-02]

  2. March 3, 2019 21:51 | #2

    @Frans Badenhorst
    Thanks for the reference and link. The EFoCS group at Strathclyde did some very interesting work in the 1990s, but unfortunately their data is now lost (Marc Roper, one of the authors, still works there; I had a copy of this paper tucked away in the ‘students’ director, and had completely forgotten about it,).

    The Daly replication was actually a partial replication, i.e., they ran one of Korson’s four experiments. This would not have been an issue if they had seen a modular/monolithic effect as large as Korson, but they didn’t (code+data). While the time taken to complete the task was slower for monolithic, the difference is not statistically significant (and was about half of the standard deviation seen in difference in performance between subjects). If they had run more than one experiment, it might have been possible to reliably detect a difference (but it would have been much smaller than Korson’s).

    The difference between the Korson and Daly results is odd. The Daly students are slower on the modular, but faster on the Monolithic. If the Daly subjects were not as competent as the Korson subjects, then they should have been slower on both; if they were more competent, they should have been better on both.

    Half of the Korson subjects were professional developers, who would be expected to perform better than the all student subjects in the Daly experiment.

    Psychologists often see lots of variation when different groups run the same experiment. I once read that a result was not accepted until 100 groups had reliably replicated it.

  1. No trackbacks yet.