GI '18- Proceedings of the 4th International Workshop on Genetic Improvement Workshop

Full Citation in the ACM Digital Library

Neutrality and epistasis in program space

Neutral networks in biology often contain diverse solutions with equal fitness, which can be useful when environments (requirements) change over time. In this paper, we present a method for studying neutral networks in software. In these networks, we find multiple solutions to held-out test cases (latent bugs), suggesting that neutral software networks also exhibit relevant diversity. We also observe instances of positive epistasis between random mutations, i.e. interactions that collectively increase fitness. Positive epistasis is rare as a fraction of the total search space but significant as a fraction of the objective space: 9% of the repairs we found to look (and 4.63% across all programs analyzed) were produced by positive interactions between mutations. Further, the majority (62.50%) of unique repairs are instances of positive epistasis.

Experiments in genetic divergence for emergent systems

Emergent software systems take a step towards tackling the ever-increasing complexity of modern software, by having systems self-assemble from a library of building blocks, and then continually re-assemble themselves from alternative building blocks to learn which compositions of behaviour work best in each deployment environment. One of the key challenges in emergent systems is populating the library of building blocks, and particularly a set of alternative implementations of particular building blocks, which form the runtime search space of optimal behaviour. We present initial work in using a fusion of genetic improvement and genetic synthesis to automatically populate a divergent set of implementations of the same functionality, allowing emergent systems to explore new behavioural alternatives without human input. Our early results indicate this approach is able to successfully yield useful divergent implementations of building blocks which are more suited than any existing alternative for particular operating conditions.

A turing test for genetic improvement

Genetic improvement is a research field that aims to develop search-based techniques for improving existing code. GI has been used to automatically repair bugs, reduce energy consumption, and to improve run-time performance. In this paper, we reflect on the often-overlooked relationship between GI and developers within the context of continually evolving software systems. We introduce a distinction between transparent and opaque patches based on intended lifespan and developer interaction. Finally, we outline a Turing test for assessing the ability of a GI system to produce opaque patches that are acceptable to humans. This motivates research into the role GI systems will play in transparent development contexts.

Comparing line and AST granularity level for program repair using PyGGI

PyGGI is a lightweight Python framework that can be used to implement generic Genetic Improvement algorithms at the API level. The original version of PyGGI only provided lexical modifications, i.e., modifications of the source code at the physical line granularity level. This paper introduces new extensions to PyGGI that enables syntactic modifications for Python code, i.e., modifications that operates at the AST granularity level. Taking advantage of the new extensions, we also present a case study that compares the lexical and syntactic search granularity level for automated program repair, using ten seeded faults in a real world open source Python project. The results show that search landscapes at the AST granularity level are more effective (i.e. eventually more likely to produce plausible patches) due to the smaller sizes of ingredient spaces (i.e., the space from which we search for the material to build a patch), but may require longer time for search because the larger number of syntactically intact candidates leads to more fitness evaluations.

Performance localisation

Profiling techniques highlight where performance issues manifest and provide a starting point for tracing cause back through a program. While people diagnose and understand the cause of performance to guide formulation of a performance improvement, we seek automated techniques for highlighting performance improvement opportunities to guide search algorithms.

We investigate mutation-based approaches for highlighting where a performance improvement is likely to exist. For all modification locations in a program, we make all possible modifications and analyse how often modifications reduce execution count. We compare the resulting code location rankings against rankings derived using a profiler and find that mutation analysis provides the higher accuracy in highlighting performance improvement locations in a set of benchmark problems, though at a much higher execution cost.

We see both approaches as complimentary and consider how they may be used to further guide Genetic Programming in finding performance improvements.

A spoonful of DevOps helps the GI go down

DevOps emphasizes a high degree of automation at all phases of the software development lifecyle. Meanwhile, Genetic Improvement (GI) focuses on the automatic improvement of software artifacts. In this paper, we discuss why we believe that DevOps offers an excellent technical context for easing the adoption of GI techniques by software developers. We also discuss A/B testing as a prominent and clear example of GI taking place in the wild today, albeit one with human-supervised fitness and mutation operators.

Learning to synthesize

In many scenarios we need to find the most likely program under a local context, where the local context can be an incomplete program, a partial specification, natural language description, etc. We call such problem program estimations. In this paper we propose an abstract framework, learning to synthesis, or L2S in short, to address this problem. L2S combines four tools to achieve this: rewriting rules are used to define the search space and search steps, constraint solving is used to prune off invalid candidates at each search step, machine learning is used to estimate conditional probabilities for the candidates at each search step, and search algorithms are used to find the best possible solution. The main goal of L2S is to lay out the design space to motivate the research on program estimation.

We have performed a preliminary evaluation by instantiating this framework for synthesizing conditions of an automated program repair (APR) system. The training data are from the project itself and related JDK packages. Compared to ACS, a state-of-the-art condition synthesis system for program repair, our approach could deal with a larger search space such that we fixed 4 additional bugs outside the search space of ACS, and relies only the source code of the current projects.

Evolutionary fuzzing for genetic improvement: toward adaptive software defense

As fuzz testing strategies have become more and more sophisticated, we see a natural application of fuzz testing to Genetic Improvement techniques. In particular, the ability to generate high quality and high coverage tests with advanced fuzzers can greatly enhance the effectiveness of Genetic Improvement algorithms---especially when the algorithm is applied to bug fixing or other similar kinds of software improvement to improve qualities such as security.