Saturday 16 September 2017

The Problem of Productivity in Software Engineering Research

Software engineering research has a productivity problem. There are many researchers across the world who are engaged in software-engineering research, but the path from idea to publication is often a fraught one. As a consequence there is a danger that many important ideas and results are not receiving the attention they deserve within academia or finding their way to the practitioners whom the research is ultimately intended to benefit.

One of the biggest barriers faced by software engineering researchers is (perhaps ironically) the need to produce software. Research is overwhelmingly concerned with the development of automated techniques to support activities such as testing, remodularisation and comprehension. It is rightly expected that, in order to publish such a technique at a respectable venue, the proposed approach has to be accompanied by some empirical data, generated with the help of a proof-of-concept tool.

Developing such a tool requires a lot of time and effort. This effort can be roughly spread across two dimensions
(1) the `scientific’ challenge of identifying and applying suitable algorithms and data-types to fit the problem, and running experiments to gather data, and
(2) the `engineering’ challenge of ensuring that the software is portable, usable, and can be applied in an `industrial’ setting, to scale to arbitrarily large systems, to be used by a broad range of users.

Whereas the first dimension can often be accomplished within a relatively short time-frame (a couple of person months perhaps), the second dimension — taking an academic tool and scaling it up — can rapidly become enormously time-consuming. In practice, doing so will often only realistically be possible in a well-resourced and funded lab, where the researcher is accompanied by one or more long-term post-doctoral research assistants.

This is problematic because the second dimension is often what matters when it comes to publication. An academic tool that is widely applicable can be used to generate larger volumes of empirical data, from a broader range of subject systems. Even if the underlying technique is not particularly novel or risky, the fact that it is accompanied by a large volume of empirical data renders it immediately more publishable than a technique that, whilst more novel and interesting, does not have a tool that is a broadly applicable or scalable, and thus does not have the same volume of empirical data. I previously discussed this specific problem in the context of software testing research.

Indeed, the out-of-the-box performance of the software tool (as accomplished by dimension 2) is often used to assess at face-value the performance of the technique it seeks to implement (regardless of whether or not the tool was merely intended as a proof of concept). One of the many examples of this mindset shines through in the ASE 2015 paper on AndroTest, where a selection of academic tools (often underpinned by non-trivial heuristics and algorithms) were compared against the industrial, conceptually much simpler MonkeyTest  random testing tool. Perhaps embarrassingly for the conceptually more advanced academic tools, MonkeyTest was shown to be the hands-down winner in terms of performance across the board. I am personally uneasy about this sort of comparison, because it is difficult to delineate to what extent the (under-)performance of the academic tools was simply due to a lack of investment in the `engineering’ dimension. Had they been more usable and portable, with less dependence upon manual selection of parameters etc., would the outcome have been different?

This emphasis on the engineering dimension is perhaps one of the factors that contributes to what Moshe Vardi recently called the “divination by program committee”. He argues that papers are often treated as “guilty until proven innocent”, and the maturity and industrial applicability of an associated tool can, for many reviewers, become a factor in deciding whether a paper (and its tool) should make the cut.

In my view, this is the cause of a huge productivity problem in software engineering. The capacity to generate genuinely widely usable tools that can produce large volumes of empirical data is rare. Efforts to publish novel techniques based proof-of-concept implementations geared towards smaller-scale, specific case studies often fail to reach the top venues and fail to make the impact they perhaps should.  


In his blog, Moshe Vardi suggests that reviewers and PC members should perhaps adopt a shift in attitude towards one of “innocent until proven guilty”. In my view, this more lenient approach taken by reviewers should include a shift away from this overarching emphasis on empirical data and generalisability (implying the need for highly engineered tools).  

Friday 16 June 2017

On The Tension between Utility and Innovation in Software Engineering


For a software engineering publication to be published, it must above all provide some evidence of the fact that it is of value (or at least potentially of value) in a practical, industrial context. Software Engineering publications and grant proposals live or die by their perceived impact upon and value to the software industry. 

To be published at a high-impact venue the value of a piece of research hinges on the ability to demonstrate this with a convincing empirical study, with extra credit given to projects that involve large numbers of industrial developers and projects. For a grant proposal to be accepted it should ideally involve significant commitments from industrial partners.

Of course this makes sense. Funding councils should rightly expect some form of return on investment; by funding Software Engineering researchers there should be some form of impact upon the industry as a result. The motivation of any research should always ultimately be to improve the state of the art in some respect. Extensive involvement of industrial partners can potentially bridge the “valley of death” in the technology readiness levels between conceptual research and industrial application.

However, there are downsides to framing the value of research area in such starkly utilitarian terms.  There is a risk that research effort becomes overly concentrated on activities such as tool development, developer studies and data collection. Evaluation focusses from novelty and innovation to issues such as the ease with which the tool can be deployed and the wealth of data supporting its efficacy. This is ok if an idea is easy to implement as a tool, and the data is easy to collect. Unfortunately, this only tends to be the case for technology that is already well established (for which there are already plenty of APIs around for example), and where the idea lends itself to easy data collection, or the data already exists and merely has to be re-analysed.

There is however no incentive (in fact, there is a disincentive) to embark upon a line of research for which tools and empirical studies are harder to construct in the short-term, or for which data cannot readily be harvested from software repositories. It is surely the case that the truly visionary, game-changing ideas might require a long time (5-10 years) to refine and will potentially require cultural changes that will put them (at least in the initial years of a project) beyond the remit of empirical studies. But it is surely within this space that the truly game-changing innovations lie.

The convention is that early-stage research should be published in workshops and “new idea” papers, and can only graduate to full conference or journal papers once it is “mature” enough. This is problematic because a truly risky, long-term project of the sort mentioned above would not produce the level of publications that are necessary to sustain an academic career.

This state of affairs is by no means a necessity. For example, the few Formal Methods conferences that I’ve been to and proceedings that I’ve read have always struck me as being  more welcoming of risky ideas with sketchier evaluations (despite the fact that these same conferences and researchers also have formidable links to industry). 

It is not obvious what the solution might be. However, I do believe that it probably has to involve a loosening of the empiricist straightjacket.



* For fear of this being misread, it is not my opinion that papers should in general should be excused for not having a rigorous empirical study. It’s just that some should be.

Friday 20 January 2017

Automated cars may prevent accidents. But at what cost?

The advent of driverless car technology has been accompanied by an understandable degree of apprehension from some quarters. These cars are after all entirely controlled by software, much of which is difficult to validate and verify (especially given the fact that this software tends to involve a lot of behaviour that is the result of Machine Learning). These concerns have been exacerbated by a range of well-publicised crashes of autonomous cars. Perhaps the most widely reported one was the May 2016 crash of a Tesla Model S, which “auto piloted” into the side of a tractor trailer that was crossing a highway, killing the driver in the process.

As a counter-argument, proponents of driverless technology only need to point to the data. In the US Department of Transportation report on the aforementioned Tesla accident, it was observed that the activation of Tesla’s autopilot software had resulted in a 40% decrease of crashes that resulted in airbag deployment. Tesla’s Elon Musk regularly tweets links to articles that reinforce this message, such as a link to an article, stating that “Insurance premiums expected to decline by 80% due to driverless cars”.

Why on earth not embrace this technology? Surely it is a no-brainer?

The counter-argument is that driverless cars will probably themselves cause accidents (possibly very infrequently) that wouldn’t have occurred without driverless technology. I have tried to summarise this argument previously - that the enormous complexity and heavy reliance upon Machine Learning could make these cars prone to unexpected behaviour (c.f. articles on driverless cars running red lights and causing havoc near bicycle lanes in San Francisco).

If driverless cars can in and of themselves pose a risk to their passengers, pedestrians and cyclists (this seems to be apparent), then an interesting dilemma emerges. On the one hand, driverless cars might lead to a net reduction in accidents. On the other hand, they might cause a few accidents where they wouldn’t have under the control of a human. If both are true, then the argument for driverless cars is in its essence a utilitarian one. They will benefit the majority, and the question of whether or not they harm a minority is moot.

At this point, we step from a technical discussion to an philosophical one. I don’t think that the advent of this new technology has really been adequately discussed at this level.


Should we accept a technology that, though it causes net benefits, can also lead to accidents in its own right? This is everything but a no-brainer in my opinion.