Home > Uncategorized > Research software code is likely to remain a tangled mess

Research software code is likely to remain a tangled mess

Research software (i.e., software written to support research in engineering or the sciences) is usually a tangled mess of spaghetti code that only the author knows how to use. Very occasionally I encounter well organized research software that can be used without having an email conversation with the author (who has invariably spent years iterating through many versions).

Spaghetti code is not unique to academia, there is plenty to be found in industry.

Structural differences between academia and industry make it likely that research software will always be a tangled mess, only usable by the person who wrote it. These structural differences include:

  • writing software is a low status academic activity; it is a low status activity in some companies, but those involved don’t commonly have other higher status tasks available to work on. Why would a researcher want to invest in becoming proficient in a low status activity? Why would the principal investigator spend lots of their grant money hiring a proficient developer to work on a low status activity?

    I think the lack of status is rooted in researchers’ lack of appreciation of the effort and skill needed to become a proficient developer of software. Software differs from that other essential tool, mathematics, in that most researchers have spent many years studying mathematics and understand that effort/skill is needed to be able to use it.

    Academic performance is often measured using citations, and there is a growing move towards citing software,

  • many of those writing software know very little about how to do it, and don’t have daily contact with people who do. Recent graduates are the pool from which many new researchers are drawn. People in industry are intimately familiar with the software development skills of recent graduates, i.e., the majority are essentially beginners; most developers in industry were once recent graduates, and the stream of new employees reminds them of the skill level of such people. Academics see a constant stream of people new to software development, this group forms the norm they have to work within, and many don’t appreciate the skill gulf that exists between a recent graduate and an experienced software developer,
  • paid a lot less. The handful of very competent software developers I know working in engineering/scientific research are doing it for their love of the engineering/scientific field in which they are active. Take this love away, and they will find that not only does industry pay better, but it also provides lots of interesting projects for them to work on (academics often have the idea that all work in industry is dull).

    I have met people who have taken jobs writing research software to learn about software development, to make themselves more employable outside academia.

Does it matter that the source code of research software is a tangled mess?

The author of a published paper is supposed to provide enough information to enable their work to be reproduced. It is very unlikely that I would be able to reproduce the results in a chemistry or genetics paper, because I don’t know enough about the subject, i.e., I am not skilled in the art. Given a tangled mess of source code, I think I could reproduce the results in the associated paper (assuming the author was shipping the code associated with the paper; I have encountered cases where this was not true). If the code failed to build correctly, I could figure out (eventually) what needed to be fixed. I think people have an unrealistic expectation that research code should just build out of the box. It takes a lot of work by a skilled person to create to build portable software that just builds.

Is it really cost-effective to insist on even a medium-degree of buildability for research software?

I suspect that the lifetime of source code used in research is just as short and lonely as it is in other domains. One study of 214 packages associated with papers published between 2001-2015 found that 73% had not been updated since publication.

I would argue that a more useful investment would be in testing that the software behaves as expected. Many researchers I have spoken to have not appreciated the importance of testing. A common misconception is that because the mathematics is correct, the software must be correct (completely ignoring the possibility of silly coding mistakes, which everybody makes). Commercial software has the benefit of user feedback, for detecting some incorrect failures. Research software may only ever have one user.

Research software engineer is the fancy title now being applied to people who write the software used in research. Originally this struck me as an example of what companies do when they cannot pay people more, they give them a fancy title. Recently the Society of Research Software Engineering was setup. This society could certainly help with training, but I don’t see it making much difference with regard status and salary.

Update

This post generated a lot of discussion on the research software mailing list, and Peter Schmidt invited me to do a podcast with him. Here it is.

  1. John Carter
    February 22, 2021 00:44 | #1

    Having once worked in that domain and watched others (Physicists) for even longer before transitioning to industry….

    ….I suspect the lifetime of research source code more or less matches that of the researcher and / or experiment.

    ie. Many on the post grad mill drag their pet software from post to post.

    Many supervisors have their pet software worked on and extended a by and unending stream of post grads.

    Some source code, for example the stuff that drives and monitors big pieces of experimental equipment (radio telescopes, particle accelerators…) have a life time that matches (and some times exceeds) that of the equipment.

    I now work in embedded software industry… and I realise now, looking back, how the coding and test coverage standards of most research grade software is utterly pathetic and I’d question the reliability of any result not supported by good cross checks.

    Academia is packed with perverse incentives and none more perverse than those that drive the creation of research software.

  2. February 22, 2021 02:22 | #2

    @John Carter
    I knew someone who used to take his own editor from job to job.

    Individuals are certainly like to have their own asorted collection of code that they take from job to job.

    I don’t know about experiments; people do like to reinvent the wheel. To what extent does an experiments code evolve as new postgrads are assigned to look after it? Rewriting code is a relatively common response to not understanding somebody else’s code.

    In the case of big, long-running, experiments the hardware related software probably has a lifetime that starts with installation and ends with replacement/upgrade.

    A lot of commercial software has a very short lifetime, but the much less common, long lifetime, software is what receives the attention.

    Gathering the data for research software will need boots on the ground, i.e., a lot of the code won’t be on Github.

  3. Joe
    February 22, 2021 14:56 | #3

    I am a well trained, well paid software developer in a neuroscience research lab. Maybe assumptions should be checked?

    Also, most programs ran and written by grad students are one-off scripts, not pieces of software having to be maintained over the long term.

  4. Jens
    February 22, 2021 15:33 | #4

    I too have seen something that approximates “clean” software only in academia.
    In the industry it has been a mess so far (second industry job, no clean no).

    In my opinion this is mostly because people don’t follow a lead or try to push the system in their own direction, or just don’t listen, so everything becomes inconsistent.

    The Scientists wrote the software for themselves, so they kept it easy to use by themselves as well. You know as in not shooting your own knee. 😉

  5. February 22, 2021 15:47 | #5

    @Joe
    By “well paid” are you including job satisfaction (a reasonable item to include), or do you work in a commercial lab?

    Yes, most code is one of scripts. But a few grow and have a long lifetime, and it’s usually impossible to know in advance which ones will live on.

    Most software dies young, which is why I argue that it is more cost-effective to pay to maintain the ones that survive, rather than invest in making everything easier to maintain.

  6. Arjun
    February 22, 2021 15:51 | #6

    There are some fields and individual labs where software quality and developer skill is highly valued, but I think the OP is right in general.

    I work in Genetics/Genomics, and his description fits the majority of the field. In my field the software that is widely used generally becomes that way because of ease-of-installation, utility, and support. But grant support for ongoing maintenance is difficult to find. So even if the PI values software quality it is often difficult to get or keep people who learn good developer practices.

    The PI-lab model is generally designed to move through grad-students and post-docs with the only long-term lab-member being the PI, and possibly a technician or two, but often even those are recent undergrads getting research experience before grad school.

    @Joe

  7. February 22, 2021 16:39 | #7

    There are two types of software:

    (1) systems: if it’s got a name, if it’s intended for release, if the source code is available under an open-source license, then that’s often a good sign and the quality _may_ be decent.

    (2) experiments: an unnamed, assorted collection of programs and smaller shell scripts that were never intended for release by the author (so there are no mechanisms for “installing” and there is no documentation): the only purpose was to run this once in order to produce a set of numbers that go into a paper. But as life goes, the life of software is hard to predict, and people may request the code for replication, or the author and collaborators may want to carry out more experiments – which causes tensions because of the enormous technical debt.

    It’s true there is not much credit given to research software, but you can help to influence that: if your projects involve building software, release them e.g. on github.com and write a software paper for an open source software journal like https://joss.theoj.org/ , for example.

  8. February 22, 2021 18:14 | #8

    I’m an engineer from the other side of researches or science, but somehow interested in the topic. Recently, I’ve learned about great work done by Grigori Fursin and entire community of reserach engineers with the goal to make research software more applicable to the industries by doing it with some kind of framework inside. I want to leave some links here, if you don’t mind to watch it – the talk is called ” Reproducing 150 Research Papers and Testing Them in the Real World”:
    – ACM page with webcast https://event.on24.com/wcc/r/2942043/9C904C7AE045B5C92AAB2CF216826732
    – Source docs https://zenodo.org/record/4005773?fbclid=IwAR1JGaAj4lwCJDrkJdgJQHoWroUR6zqIW1STS0D2BRkeaFf6iD0U-KakZSM#.YDPz_nlRVHZ
    – Their solution product https://cknowledge.io/ and source code https://github.com/ctuning/ck

    I guess it should be helpful to the researchers community.

  9. Oliver B.
    February 24, 2021 00:40 | #9

    Replication Crisis and ugly/messy (or even not working) software goes hand in hand.

    If for full reputation of a scientific work at least 5 independent reproducings of the research would be needed (must be enforced that *someone* does or at least tries it), and if most people fail to reproduce the results (either because the findings were not reproducible or the code/docs are crap), then it would not take too long until the researchers would make better redable/usable code.

    Citation of results is just not enough.
    It needs replication/reproducibility.
    If replication/reproducibility fails because of crap-code, the whole work has failed.
    The publication then should be retracted because of non-reproducibility.

    Insisting on the above mentioned process would solve two problems at the same time.

  10. Scott Drake
    February 24, 2021 20:29 | #10

    Maybe TDD and “strong” typing (everybody sure they know what that means, and how it differs from programming with types?) aren’t an answer? Just a suggestion.

  11. Nemo
    February 27, 2021 20:19 | #11

    A relevant quote from someone — I have forgotten the person — is to the effect that researchers used to exchange experimental data and now they exchange software bugs. As rightly pointed out, the incentives are not there. Fursin’s work is a good start (as pointed out above by Anton Yarkov).

    Two data points: (1) Many years ago, a fellow grad student was carrying out numerical simulations with s/w written by a researcher and the s/w “blew up”. He found the errors (in the worst FORTRAN code that he had ever seen, he said) and reported them to the original author, who was not interested in corrections. (2) At one start-up, I was responsible for converting research code into production code. It could only be treated as prototype code and none made it into production.

  12. John Carter
    March 1, 2021 02:06 | #12

    @Oliver B.

    Replication is an Odd beast.

    In one sense it would be nice if the experiment and the calculations were independently replicated.

    But as the supporting software shifts from large (1000’s of lines) to huge (10’s of millions of lines), replication shifts from being infeasible, to a “Jolly Bad Idea”.

    If the new replicating package demonstrates that, on a single “Happy Path”, it reproduces the results of the older “incumbent” package….

    ….that is woefully inadequate testing. If that is the main supporting evidence for the soundness of the new package, then branch coverage of that test is pathetically small, providing no worthwhile evidence for the soundness of the new package!

    The effort would have been better invested in improving the incumbent package (and improving it’s test coverage and cross checks).

  13. Nick D
    March 5, 2021 03:29 | #13

    As someone who has put significant time into both the corporate and academic worlds, and who is severely irked by the reproducibility crisis, I thought I would do what I could to help improve the state of reproducible academic software.

    One of my degree-projects was an investigation into the economic feasibility of geographically distributed low-power database servers in healthcare software. It wasn’t a great project (it didn’t produce earth-shaking results), but it was good enough to be worth investigating. As part of that, I decided to make the project as reproducible as possible.

    The PDF of the paper [0] itself has embedded in it the literate source used to produce the paper’s primary results, in Org-Mode format. The associated repository [1] contains a disk image [2] that can be booted on any computer which can both run the experiment (and produce a new paper with the updated results), and build a copy of itself for further installation or customization in other experiments.

    Of course, the README explains how to run “make all” if the user needs the help.

    Perhaps this sort of approach could be a useful step toward broader research reproducibility? With enough care, it’s certainly possible.

    —-

    0: https://nickdaly.gitlab.io/cs790-p1/notes.pdf

    1: https://gitlab.com/nickdaly/cs790-p1

    2: https://gitlab.com/nickdaly/cs790-p1/-/blob/master/bin/cs790-p1.img.xz

  14. Nick
    March 5, 2021 16:39 | #14

    (In reply to self.)

    Additionally, I feel like the only way this sort of thing is ever going to get better is if organizations actually create a position focused on reproducibility, a sort of code-archivist whose job it is to reliably duplicate the results of the experimental code in a separate, reproducible environment.

    I don’t see that happening any time soon (though I’d happily do that sort of work were it ever offered as a position) though, again, we do have tools to effectively allow that sort of work, like propellor [0], which makes it simple to declare an entire working environment and iteratively rebuild the target system until it reliably performs like the original system.

    0: https://propellor.branchable.com/

  1. February 26th, 2021 at 01:20 | #1