Archive

Posts Tagged ‘newbie’

Code bureaucracy can reduce the demand for cognitive resources

March 7, 2021 5 comments

A few weeks ago I discussed why I thought that research code was likely to remain a tangled mess of spaghetti code.

Everybody’s writing, independent of work-place, starts out as a tangled mess of spaghetti code; some people learn to write code in a less cognitively demanding style, and others stick with stream-of-conscious writing.

Why is writing a tangled mess of spaghetti code (sometimes) not cost-effective, and what are the benefits in making a personal investment in learning to write code in another style?

Perhaps the defining characteristic of a tangled mess of spaghetti code is that everything appears to depend on everything else, consequently: working out the impact of a change to some sequence of code requires an understanding of all the other code (to find out what really does depend on what).

When first starting to learn to program, the people who can hold the necessary information on increasing amounts of code in their head are the ones who manage to create running (of sorts) programs; they have the ‘knack’.

The limiting factor for an individual’s software development is the amount of code they can fit in their head, while going about their daily activities. The metric ‘code that can be fitted in a person’s head’ is an easy concept to grasp, but its definition in terms of the cognitive capacity to store, combine and analyse information in long term memory and the episodic memory of earlier work is difficult to pin down. The reason people live a monks existence when single-handedly writing 30-100 KLOC spaghetti programs (the C preprocessor Richard Stallman wrote for gcc is a good example), is that they have to shut out all other calls on their cognitive resources.

Given time, and the opportunity for some trial and error, a newbie programmer who does not shut their non-coding life down can create, say, a 1,000+ LOC program. Things work well enough, what is the problem?

The problems start when the author stops working on the code for long enough for them to forget important dependencies; making changes to the code now causes things to mysteriously stop working. Our not so newbie programmer now has to go through the frustrating and ego-denting experience of reacquainting themselves with how the code fits together.

There are ways of organizing code such that less cognitive resources are needed to work on it, compared to a tangled mess of spaghetti code. Every professional developer has a view on how best to organize code, what they all have in common is a lack of evidence for their performance relative to other possibilities.

Code bureaucracy does not sound like something that anybody would want to add to their program, but it succinctly describes the underlying principle of all the effective organizational techniques for code.

Bureaucracy compartmentalizes code and arranges the compartments into some form of hierarchy. The hoped-for benefit of this bureaucracy is a reduction in the cognitive resources needed to work on the code. Compartmentalization can significantly reduce the amount of a program’s code that a developer needs to keep in their head, when working on some functionality. It is possible for code to be compartmentalized in a way that requires even more cognitive resources to implement some functionality than without the bureaucracy. Figuring out the appropriate bureaucracy is a skill that comes with practice and knowledge of the application domain.

Once a newbie programmer is up and running (i.e., creating programs that work well enough), they often view the code bureaucracy approach as something that does not apply to them (and if they rarely write code, it might not apply to them). Stream of conscious coding works for them, why change?

I have seen people switch to using code bureaucracy for two reasons:

  • peer pressure. They join a group of developers who develop using some form of code bureaucracy, and their boss tells them that this is the way they have to work. In this case there is the added benefit of being able to discuss things with others,
  • multiple experiences of the costs of failure. The costs may come from the failure to scale a program beyond some amount of code, or having to keep investing in learning how previously written programs work.

Code bureaucracy has many layers. At the bottom there is splitting code up into functions/methods, then at the next layer related functions are collected together into files/classes, then the layers become less generally agreed upon (different directories are often involved).

One of the benefits of bureaucracy, from the management perspective, is interchangeability of people. Why would somebody make an investment in code bureaucracy if they were not the one likely to reap the benefit?

A claimed benefit of code bureaucracy is ease of wholesale replacement of one compartment by a new one. My experience, along with the little data I have seen, suggests that major replacement is rare, i.e., this is not a commonly accrued benefit.

Another claimed benefit of code bureaucracy is that it makes programs easier to test. What does ‘easier to test’ mean? I have seen reliable programs built from spaghetti code, and unreliable programs packed with code bureaucracy. A more accurate claim is that it can be unexpectedly costly to test programs built from spaghetti code after they have been changed (because of the greater likelihood of the changes having unexpected consequences). A surprising number of programs built from spaghetti code continue to be used in unmodified form for years, because nobody dare risk the cost of checking that they continue to work as expected after a modification

By now I ought to feel more knowledgeable about R

March 18, 2014 2 comments

I was surprised to find recently that there are now over 15,000 lines of R code in the book I am working on. If I had written that much code in another ‘newly’ acquired language I would probably feel a lot more knowledgeable about it than I currently feel about R. Why don’t I feel more knowledgeable about R?

Those 15,000 lines are not all real lines, lots of cut-and-paste has been going on; yes, R is a cut-and-paste language just like Cobol and ‘web’ languages. ‘Real’ programmers often look down their noses at such languages, but that is just a failure on their part to understand what they are really all about. Perhaps I have written 5,000 actual lines of R, still a decent amount and half way to the 10,000 line minimum I ask newbies if they have reached.

An expert in a language should be able to pick up a random sample of code and to have been there, done that and got the t-shirt. I still regularly learn new stuff when reading other people’s code, so I’m still a long way from being an R expert. But then R is in the mold of a functional language and one characteristic of languages in this mold is that they provide umpteen different ways of doing the same thing. The combination of this language characteristic along with the lack of common culture in R usage (when this exists it significantly reduces the patterns of code usage commonly encountered) could mean that I am on the treadmill of forever and regularly learning new R coding techniques (which is great source for blog articles but gets tedious after a while); Perl is a lot like this.

As a compiler guy I’m used to learning a language by reading the language definition. Reading this document gives me a warm fuzzy feeling of knowing the language, this has nothing to do with being able to program in it and there is no way of knowing that I understood what the words meant. I was going to say that the R language definition was little more than some brief notes jotted down by somebody to be written up later, but checking the link to the page I discovered that somebody had been spending time significantly improving on what existed a few years ago; there is still a way to go but the R language definition is starting to look respectable. Hopefully my feeling of R knowledgeability will improve after I have read through this updated definition a few times.

Use of R is usually intimately bound up with the data being manipulated; on a per line of code basis much more so than other languages (in this regard it is like Cobol). Perhaps the need to have to learn lots more about the data than I normally have to adds to my feeling of not knowing. Would my feeling of knowledgeability increase if I worked with the same kind of data ll the time?

Categories: Uncategorized Tags: , , ,