Archive

Posts Tagged ‘research’

What is known about software effort estimation in 2024

March 10, 2024 (2 weeks ago) No comments

It’s three years since my 2021 post summarizing what I knew about estimating software tasks. While no major new public datasets have appeared (there have been smaller finds), I have talked to lots of developers/managers about the findings from the 2019/2021 data avalanche, and some data dots have been connected.

A common response from managers, when I outline the patterns found, is some variation of: “That sounds about right.” While it’s great to have this confirmation, it’s disappointing to be telling people what they already know, even if I can put numbers to the patterns.

Some of the developer behavior patterns look, to me, to be actionable, e.g., send developers on a course to unbias their estimates. In practice, managers are worried about upsetting developers or destabilising teams. It’s easy for an unhappy developer to find another job (the speakers at the meetups I attend often end by saying: “and we’re hiring.”)

This post summarizes a talk I gave recently on what is known about software estimating; a video will eventually appear on the British Computer Society‘s Software Practice Advancement group’s YouTube channel, and the slides are on Github.

What I call the historical estimation models contain source code, measured in lines, as a substantial component, e.g., COCOMO which overfits a miniscule dataset. The problem with this approach is that estimates of the LOC needed to implement some functionality LOC are very inaccurate, and different developers use different LOC to implement the same functionality.

Most academic research in software effort estimation continues to be based on miniscule datasets; it’s essentially fake research. Who is doing good research in software estimating? One person: Magne Jørgensen.

Almost all the short internal task estimate/actual datasets contain all the following patterns:

I have a new ChatGPT generated image for my slide covering the #Noestimates movement:

ChatGPT image promoting fun at work.

My 2023 in software engineering

December 31, 2023 1 comment

In a 2009 post, I predicted that Chinese and Indian developers would become a major influence in the next decade. This year, it was very noticeable that many of the authors of papers at major conferences had Asian names. I would say that, on average, papers with Asian author names were better than papers by authors with non-Asian names.

While LLMs dominated the software news this year, the lead time for research projects and conference submission deadlines meant that few of the papers accepted at this year’s top ranked conferences were LLM based, e.g., around 5% at ICSE. I expect there will be a much higher percentage of LLM based papers in 2024, which I think will be a disaster for software engineering research, at least in the short term. From what I have seen and read, much of LLM based software engineering is driven by fashion and/or a desire to gain experience that leads to a job in AI. Discovering something useful about software development takes a back seat (the current fashionable topic, butterfly collecting, at least produces potentially useful datasets). I think that LLMs are going to be very useful for analyzing text data, e.g., named entity recognition.

London based, software related meetups have come back to life. I go to around 1-2 a week, and the regular good ones include: Internet of Things, Extreme Tuesday Club, London Prompt Engineers, and London R. On the academic front, I have started attending the software reliability seminars at Imperial, and funding means that the excellent Crest Open Workshops are down to two a year. There were a handful of hackathons this year, and I got to go to one of them, a LLM hackathon.

Not usually software specific: Newspeak House hosts a variety of events that are often attended by many developers and those associated with the rationalist community. I attend maybe 2–3 events a month.

What did I learn/discover about software engineering this year?

The evidence-based software engineering Discord channel ticks over (invitation), with sporadic interesting exchanges.

Evidence-based Software Engineering book: the last year

November 5, 2023 No comments

It’s now three years since my book, Evidence-based Software Engineering: based on the publicly available data, was released. What has happened in the last year, since I wrote about the first two years, and what might happen in the next year or so?

There is now a Discord channel for discussing evidence-based software engineering. Blog readers and anyone with an interest in the subject are most welcome.

I keep a copy of software related papers that I think might be worth looking at again, and have been meaning to make this list public. A question by ysch, a Discord channel member, asked after ways of checking whether a software paper was worth reading. This prompted me to create a Github repo containing the titles of these 7,756 saved papers, along with some data related annotations. On the more general question of paper quality, my view is that most papers are not worth reading, with a few being very well worth reading. People have to develop techniques for rapidly filtering out the high volume of drivel; techniques I use, and understanding the publication ecosystem.

This last year saw the sudden arrival of a new tool, LLMs. My experience with using ChatGPT (and other such LLMs) as an evidence-based research tool is that the answers are too generic or just plain wrong (for several months, one LLM reported that I had a degree in Divinity Studies). If I was writing a book, I suspect that they would provide a worthwhile copy-editing service.

I was hoping that the recently released GPT-4 vision model would do high quality text extraction from scanned pdfs, but the quality of output I have received is about the same as traditional OCR-based tools. I expect that the data extraction ability LLM based tools will get a lot better, because they are at the start of the learning curve and there is a commercial incentive for them to be a lot better.

An LLM is driven by the token weights learned during training. Roughly speaking, the more training data on a topic, the larger the trained weights for that topic. There is not a lot of data (i.e., text) relating to evidence-based software engineering, compared to the huge quantities available for some topics, so responses are generic and often parrot established folklore. The following image was generated by DALL-E3:

Generate an image of white male and Asian female leprechaun engineers examining software to check that it follows well-known folklore traditions that are claimed to ensure correct behavior.

There is a tale of software product evolution waiting to be told via the data contained in magazine adverts; the magazines are on bitsavers, we just need LLMs to be good enough to reliably extract advert contents (currently, too many hallucinations).

The book contents continue to survive almost completely unscathed, primarily because reader feedback continues to be almost non-existent. Despite the close to 500k downloads (now averaging 4k-5k downloads per month, from the logs I have, with the mobile friendly version around 10%), most people I meet have not heard of the book. The concept of an evidence-based approach to software engineering continues to be met with blank looks, although a commonly cited listener use case for the book’s data is validating a pet theory (my suggestion that the data may show their pet theory to be wrong is not appreciated).

Analysis/data in the following blog posts, from the last 12-months, belongs in the book in some form or other:

Some human biases in conditional reasoning

Unneeded requirements implemented in Waterfall & Agile

Analysis of Cost Performance Index for 338 projects

Evaluating Story point estimation error

Frequency of non-linear relationships in software engineering data

Analysis of when refactoring becomes cost-effective

An evidence-based software engineering book from 2002

Perturbed expressions may ‘recover’

Predicting the size of the Linux kernel binary

Benchmarking fuzzers

Local variable naming: some previously unexplored factors

Optimal function length: an analysis of the cited data

Some data on the size of Cobol programs/paragraphs
Hardware/Software cost ratio folklore

Criteria for increased productivity investment

Likelihood of encountering a given sequence of statements

Design of the Whetstone benchmark

Relative performance of computers from the 1950s/60s/70s

LLMs and doing software engineering research

September 24, 2023 No comments

This week I attended the 65th COW workshop, the theme was Automated Program Repair and Genetic Improvement.

I first learned about using genetic programming to automatically fix reported faults at the 1st COW workshop in 2009. Claire Le Goues, a PhD student at that workshop, now a professor, returned to talk about the latest program repair work of her research group.

COW speakers are usually very upbeat, but uncertainty about the future was the general feeling I got from speakers at this workshop. The cause of this uncertainty was the topic of some talks and conversations: LLMs. Adding an LLM into the program repair process can produce a dramatic performance improvement.

Isn’t a dramatic performance improvement and a new technique great news for everyone? The performance improvement increases the likelihood of industrial adoption, and a new technique creates many opportunities for new research.

Despite claiming otherwise, most academics have zero interest in industrial adoption of their work, and some actively disdain practical uses of their work.

Major new techniques are great for PhD students; they provide an opportunity to kick-start a career by being in at the start of a new research area.

A major new technique can obsolete an established researcher’s expensively acquired area of expertise (expensive in personal time and effort). The expertise that enables a researcher to make state-of-the-art contributions to an active research area is a valuable asset; it can be used to attract funding, students and peer esteem. When a new technique dramatically improves the state-of-the-art, there is a sharp drop in the value of what is now yesterday’s know-how.

A major new technique removes some existing barriers to entering a field, and creates its own new ones. The result is that new people start working in a field, and some existing experts stop working in it.

At the workshop, I saw this process starting in automated program repair, and I imagine it’s also starting in many other research fields. It will probably take 3–5 years for the dust to start to settle; existing funded projects have to complete, and academia does not move that quickly.

A recent review of the use of LLMs in software engineering research found 229 papers; the table below shows the number of papers per year:

    Papers   Year
       7     2020
      11     2021
      51     2022
     160     2023 to end July

Assuming, say, 10K software engineering papers per year, then LLM related papers should be around 3% this year, likely in double figures next year, and possibly over 50% the year after.

Is research in software engineering en route to becoming another subfield of prompt engineering research?

Anthropology and building software systems

June 25, 2023 No comments

Software systems are built by people, who are usually a member of one or more teams. While a lot of research effort has gone into studying the software/hardware used to build these systems, almost no effort has been invested in studying the activities of the people involved.

The study of human behaviors and cultures, in the broadest sense, sits within the field of Anthropology. The traditional image of an Anthropologist is someone who spends an extended period living with some remote tribe, publishing a monograph about their experiences on return to ‘civilisation’. In practice, anthropologists also study local tribes, such as professional workers.

Studies of the computer industry, by anthropologists, include: Global “Body Shopping” An Indian Labor System in the Information Technology Industry by Xiang Biao, and Cultures@SiliconValley by J. A. English-Lueck.

Reporters and professional authors sometimes write popular books for a general audience, which might be labelled pop anthropology. For instance, Kidder’s The Soul of a New Machine.

These academic/reporter publications are usually written by outsiders for an audience of outsiders. They are not intended to provide insights for insiders (Kidder’s book strikes me as reporting on the chaos that ensues when dysfunctional teams have to work together, which is not how it is described on its back cover).

If insiders want to learn about their community, some degree of insider knowledge is needed; exploring culture from the point of view of the subject of the study is known as Ethnography. Acquiring this knowledge can take years, an investment that will deter most researchers. Insightful insider commentary is most likely to come from insiders.

These days, insiders who write usually have blogs. Gerald Weinberg was an insider of times gone by, who wrote popular books for insiders about consulting in the software business; perhaps the most well known being “The Psychology of Computer Programming” (which really ought to be titled “The Sociology of Computer Programming”).

Who might be the consumers of research by anthropologists of software system development (assuming that a non-trivial amount eventually gets done)?

There are important outsiders, such as lawmakers looking to regulate.

Insiders only ever get to experience a sliver of the culture of software communities. The considered experiences of others can provide interesting insights, in particular learning about how teams working within other application domains operate.

Those seeking to change company culture ought to be looking to anthropology as a source of ideas for things that might work, or not.

History deals with the outcomes of past human behavior and culture, and there are a handful of historians of computing.

Software engineering research problems having worthwhile benefits

March 21, 2021 3 comments

Which software engineering research problems are likely to yield good-enough solutions that provide worthwhile benefits to professional software developers?

I can think of two (hopefully there are more):

  • what is the lifecycle of software? For instance, the expected time-span of the active use of its various components, and the evolution of its dependency ecosystem,
  • a model of the main processes involved in a software development project.

Solving problems requires data, and I think it is practical to collect the data needed to solve these two problems; here is some: application lifetime data, and detailed project data (a lot more is needed).

Once a good-enough solution is available, its practical application needs to provide a worthwhile benefit to the customer (when I was in the optimizing compiler business, I found that many customers were not interested in more compact code unless the executable was at least a 10% smaller; this was the era of computer memory often measured in kilobytes).

Investment decisions require information about what is likely to happen in the future, and an understanding of common software lifecycles is needed. The fact that most source code has a brief existence (a few years) and is rarely modified by somebody other than the original author, has obvious implications for investment decisions intended to reduce future maintenance costs.

Running a software development project requires an understanding of the processes involved. This knowledge is currently acquired by working on projects managed by people who have successfully done it before. A good-enough model is not going to replace the need for previous experience, some amount of experience is always going to be needed, but it will provide an effective way of understanding what is going on. There are probably lots of different good-enough ways of running a project, and I’m not expecting there to be a one-true-way of optimally running a project.

Perhaps the defining characteristic of the solution to both of these problems is lots of replication data.

Applications are developed in many ecosystems, and there is likely to be variations between the lifecycles that occur in different ecosystems. Researchers tend to focus on Github because it is easily accessible, which is no good when replications from many ecosystems are needed (an analysis of Github source lifetime has been done).

Projects come in various shapes and sizes, and a good-enough model needs to handle all the combinations that regularly occur. Project level data is not really present on Github, so researchers need to get out from behind their computers and visit real companies.

Given the payback time-frame for software engineering research, there are problems which are not cost-effective to attempt to answer. Suggestions for other software engineering problems likely to be worthwhile trying to solve welcome.

The Nostradamus argument in software engineering research

October 2, 2018 No comments

The Nostradamus argument in software engineering research goes something like: This idea was proposed in a paper by XX, some years ago.

I regularly encounter the Nostradamus argument when discussing what people in industry are doing, with one or more academics. The same argument is probably made in other fields.

The rules of academic research pretty much guarantee that somebody, at sometime, has published a paper containing an idea related to something being discussed today.

The first researcher(s) to publish an idea gets the credit for the idea, and ‘uses it up’ the idea, that is somebody else cannot subsequently publish a paper claiming that idea (it does happen, either through plagiarism or slip-ups during review).

The job of researchers is to find new ideas (well, actually these days it is to quickly find an idea that will get published; researchers are on a publication treadmill). Sometimes a paper will explicitly point out the novel idea they are claiming (usually a sign of a very poor paper; the author(s) obviously don’t feel confident that the reader will see anything of merit). Researchers also talk of gaps in the literature, i.e., some topic where little, if anything, has been published.

Before starting work in an area, researchers are supposed to read all relevant prior publications; this can be an awful lot of work and take a lot of time. In practice people tend to read the papers in the top 10, or so, journals published in the last few years; maybe looking at more journals and going further back in time if the initial search fails to return many results. I have had many conversations with researchers about a paper, or thesis, they are just completing and been told “I’m just finishing off the literature search”, i.e., they are doing the background checks after completing their research, not before (yes, sometimes rather similar work has already been published and some quick footwork is needed).

So the work of prior researchers is venerated in theory, but rarely in practice.

Categories: Uncategorized Tags: ,

The world view of research in software engineering

September 26, 2018 1 comment

For a long time I have been trying to figure out why so much research in software engineering is so obviously unconnected to the reality of software development.

As might have been guessed, the answer has been staring me in the face for some time.

Many researchers in software engineering have a modified mathematicians’ world view of research, i.e., investigate things we find interesting (the mathematicians’ view) and some years from now industry will discover our work and apply it (the modification). I have had multiple academics essentially say this to me and I had not appreciated that I need to argue against a world view (not specific points of that view). This mathematician world view also explains why my questions about evidence receive such baffled looks; and, I am regularly told that experiments cannot be done, or are meaningless, in software engineering research.

Which research field’s world view might be closest to software engineering? I would nominate drug discovery.

Claims made by researchers in drug discovery are expected to be backed up with evidence. There are problems to be solved (e.g., diseases to be cured) and researchers try out ideas by running experiments. They don’t put lots of time and effort into creating a new drug, propose this drug as cure for some disease and then wait for industry to run some experiments, to see if the claims are true. I’m a regular reader of In The Pipeline, an interesting drug discover blog that is accessible to those outside the field.

How do I argue against a world view? I have no idea; even if I did, I am not looking to start a crusade.

At least I now have a model of the situation that makes sense. Next month, I will be attending some workshops where there will be lots of researchers and I will get to try out my new insight.

Replicating results using research software

May 14, 2018 No comments

The reproducibility of results, from scientific studies, has always been an important issue. Over the last few years software has become a hot topic in reproducibility circles; many researchers have an expectation that if they run the original researcher’s software, they will replicate the results. Reality has not lived up to their expectations and there has been a lot of flapping around looking for a solution. There is a solution, but first, why does the problem exist?

I have spent a lot of time porting software to different compilers (when I was in the compiler business, I wanted everybody to port their applications to the compiler I was working), different hardware (oh, the days when every major vendor had at least one distinct cpu; not like today where it’s x86, ARM, or embedded), different operating systems (umpteen flavors of Unix, all with slightly different header file contents and library behavior; the Unix wars were good for those in the porting business) and every now and again different languages (by translating).

The Wintel alliance wiped out variation in cpus and operating systems (they can still be found lurking in dark corners) and open source compilers created a near monoculture of compilers for the major languages.

The major software portability problems of 30 years ago have become rather minor. But software portability problems that once tended to be minor (at least for scientific software), have grown to become a major headache. Today’s major portability problems center around evolution of the libraries/packages being used, and longer term the evolution of the language(s) used.

Evolution has created development ecosystems where there are rampant dependencies on specific, or earlier than, or later than versions of libraries/packages. I have been out of the porting business for several decades, but talking to those doing it today, the story is the same; experience in porting from A to B is everything, second best is talking to somebody else who has gone in that direction and third best are the one-line forums such as stackoverflow.

Researchers are doing research on who-knows-what and probably have need-to-know knowledge of software and the libraries they are using, the researchers receiving a copy of the original software might know less. What is the probability that the originating and receiving researchers have exactly the same versions of libraries installed? The receiving researcher may not have any of the required libraries installed, and promptly install the latest version (which may well be more recent than the ones used by the original researcher).

A solution is available; distribute a duplicate of the researchers complete system as a container, e.g., a Docker image.

Containers solve the replication problem. But these days people want more, they actually think it should be possible to take research software and modify it to suite their own needs. Good luck with that.

Research software is written to solve a problem, often by people writing their first non-trivial programs (i.e., they are novices), with no incentive to produce something that is easy for others to use. When software is written by experienced developers, who have an incentive to build something that is easy for others to work with, multiple reimplementations are often still required to achieve something of decent quality. Creating robust software, that others can use, is very hard.

The problem with software is its invisibility; the difficulties are not visible. When the internal operations are visible, the difficulties of making changes are easier to see.

James Albert Bonsack's cigarette rolling machine

James Albert Bonsack’s cigarette rolling machine (from Wikipedia).

Categories: Uncategorized Tags: ,

GDPR has a huge impact on empirical software engineering research

April 5, 2018 4 comments

The EU’s General Data Protection Regulation (GDPR) is going to have a huge impact on empirical software engineering research. After 25 May 2018, analyzing source code will never be the same again.

I am not a lawyer and nothing qualifies me to talk about the GDPR.

People put their name in source code, bug tracking databases and discussion forums; this is personal identifying information.

Researchers use personal names to obtain information about a wide variety of activities, e.g., how much code did individuals write, how many bug reports did they process, contributions in discussions of one sort or another.

Open source licenses give others all kinds of rights (e.g., ability to use and modify source code), but they do not contain any provisions for processing personal data.

Adding a “I hereby give permission for anybody to process information about my name in any way they see fit.” clause to licenses is not going to help.

The GDPR requires (article 5: Principles relating to processing of personal data):

“Personal data shall be: … collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes;”

That is, personal data can only be processed for the specific reason it was collected, i.e., if you come up with another bright idea for analysis of data that has just been collected, it may be necessary to obtain consent, from those whose personal data it is, before trying out the bright idea.

It is not possible to obtain blanket permission (article 6, Lawfulness of processing):

“…the data subject has given consent to the processing of his or her personal data for one or more specific purposes;”, i.e., consent has to be obtained from the data subject for each specific purpose.

Github’s Global Privacy Practices shows that Github are intent on meeting the GDPR requirements, they include: “GitHub provides clear methods of unambiguous, informed consent at the time of data collection, when we do collect your personal data.”. Processing personal information, about an EU citizen, contained in source code appears to be a violation of Github’s terms of service.

The GDPR has many other requirements, e.g., right to obtain information on what information is held and right to be forgotten. But, the upfront killer is not being able to cheaply collect lots of code and then use personal information to help with the analysis.

There are exceptions for: Processing for archiving, scientific or historical research or statistical purposes. Can somebody who blogs and is writing a book claim to be doing scientific research? People who know more about these exceptions than me, tell me that there could be a fair amount of paperwork involved when making use of the exception, i.e., being able to show that privacy safeguards are in place.

Then, there is the issue of what constitutes personal information. Git’s hashing algorithm makes use of the committer’s name and/or email address. Is a git hash personal identifying information?

A good introduction to the GDPR for developers, and one for researchers.

Categories: Uncategorized Tags: , ,