Posts Tagged ‘modeling’

Predictive Modeling: 15th COW workshop

October 26th, 2011 No comments

I was at a very interesting workshop on Predictive Modeling and Search Based Software Engineering on Monday/Tuesday this week and am going to say something about the talks that interested me. The talks were recorded and the videos will appear on the web site in a few weeks. The CREST Open Workshop (COW) runs roughly once a month and the group leader, Mark Harman, is always on the lookout for speakers, do let him know if you are in the area.

  • Tim Menzies talked about how models built from one data set did well on that dataset but often not nearly as well on another (i.e., local vs global applicability of models). Academics papers usually fail to point out that that any results might not be applicable outside of the limited domain examined, in fact they often give the impression of being generally applicable.

    Me: Industry likes global solutions because it makes life simpler and because local data is often not available. It is a serious problem if, for existing methods, data on one part of a companies software development activity is of limited use in predicting something about a different development activity in the same company and completely useless at predicting things at a different company.

  • Yuriy Brun talked about something that is so obviously a good idea it is hard to believe that it had not been done years ago. The idea is to have your development environment be aware of what changes other software developers have made to their local copies of source files you also have checked out from version control. You are warned as soon your local copy conflicts with somebody else’s local copy, i.e., a conflict would occur if you both check in your local copy to the central repository. This warning has the potential to save lots of time by having developers talk to each about resolving the conflict before doing any more work that depends on the conflicting change.

    Crystal is a plug-in for Eclipse that implements this functionality and Visual studio support is expected in a couple of releases time.

    I have previously written about how multi-core processors will change software development tools and I think this idea falls into that category.

  • Martin Shepperd presented a very worrying finding. An analysis of the results published in 18 papers dealing with fault prediction found that the best predictor (over 60%) of agreement between results in different papers was co-authorship. That is, when somebody co-authored a paper with another person any other papers they published were more likely to agree with other results published by that person than with results published by somebody they had not co-authored a paper with. This suggests that each separate group of authors is doing something different that significantly affects their results; this might be differences in software packages being used, differences in configuration options or tuning parameters, so something else.

    It might be expected that agreement between results would depend on the techniques used, but Shepperd et als analysis found this kind of dependency to be very small.

    An effect is occurring that is not documented in the published papers; this is not how things are supposed to be. There was lots of interest in obtaining the raw data to replicate the analysis.

  • Camilo Fitzgerald talked about predicting various kinds of feature request ‘failures’ and presented initial results based on data mined from various open source projects; possible ‘failures’ included a new feature being added and later removed and significant delay (e.g., 1 year) in implementing a requested feature. I have previously written about empirical software engineering only being a few years old and this research is a great example of how whole new areas of research are being opened up by the availability of huge amounts of data on open source projects.

    One hint for PhD students: It is no good doing very interesting work if you don’t keep your web page up to date so people can find out more about it

I talked to people who found other presentations very interesting. They might have failed to catch my eye because my interest or knowledge of the subject is low or I did not understand their presentation (a few gave no background or rationale and almost instantly lost me); sometimes the talks during coffee were much more informative.