AST '18- Proceedings of the 13th International Workshop on Automation of Software Test

Full Citation in the ACM Digital Library

SESSION: Keynote 1

Software testing as a problem of machine learning: towards a foundation on computational learning theory (extended abstract of keynote speech)

In recent years, the application of machine learning techniques to software testing has been an active research area. Among the most notable work reported in the literature are those experiments on the uses of supervised and semi-supervised learning techniques to develop test oracles so that the correctness of software outputs and behaviours on new test cases can be predicated [1]. Experiment data show that it seems a promising approach to the test oracle automation problem. In general, software testing is an inductive inference in the course of which the tester attempts to deduce general properties of a software system by observing the behaviours of the system on a finite number of test cases [2]. Thus, there is a great potential for the application of machine learning to software testing.

SESSION: Test models

An automated model-based test oracle for access control systems

In the context of XACML-based access control systems, an intensive testing activity is among the most adopted means to assure that sensible information or resources are correctly accessed. Unfortunately, it requires a huge effort for manual inspection of results: thus automated verdict derivation is a key aspect for improving the cost-effectiveness of testing. To this purpose, we introduce XACMET, a novel approach for automated model-based oracle definition. XACMET defines a typed graph, called the XAC-Graph, that models the XACML policy evaluation. The expected verdict of a specific request execution can thus be automatically derived by executing the corresponding path in such graph. Our validation of the XACMET prototype implementation confirms the effectiveness of the proposed approach.

Testing service oriented architectures using stateful service visualization via machine learning

Today's enterprise software systems are much complicated than the past. Increasing number of dependent applications, heterogeneous technologies and wide usage of Service Oriented Architectures (SOA), where numerous services communicate with each other, makes testing of such systems challenging. For testing these software systems, the concept of service virtualization is gaining popularity. Service virtualization is an automated technique to mimic the behavior of a given real service. Services can be classified as stateless or stateful services. Many services are stateful in nature. Although there are works in the literature for virtualization of state-less services, no such solution exists for stateful services. To the best of our knowledge, this is the first work for stateful service virtualization. We employ classification based and sequence-to-sequence based machine learning algorithms in developing our solutions. We demonstrate the validity of our approach on two data sets collected from real life services and obtain promising results.

Revisiting AI and testing methods to infer FSM models of black-box systems

Machine learning in the form of inference of state machine models has gained popularity in model-based testing as a means of retrieving models from software systems. By combining an old idea from machine inference with methods from automata testing in a heuristic approach, we propose a new promising direction for inferring black box systems that cannot be reset. Preliminary experiments show that this heuristic approach scales up well and outperforms more systematic approaches.

SESSION: Mobile app testing

Planning-based security testing of web applications

Web applications are deployed on machines around the globe and offer almost universal accessibility. The systems ensure functional interconnectivity between different components on a 24/7 basis. One of the most important requirements represents data confidentiality and secure authentication. However, implementation flaws and unfulfilled requirements can result in security leaks that can be eventually exploited by a malicious user. Here different testing methods are applied in order to detect software defects and prevent unauthorized access in advance.

Automated planning and scheduling provides the possibility to specify a specific problem and to generate plans, which in turn guide the execution of a program. In this paper, a planning-based approach is introduced for modeling and testing of web applications. The specification offers a high degree of extendibility and configurability but overcomes the limits of traditional graphical representations as well. In this way, new testing possibilities emerge that eventually lead to better vulnerability detection, thereby ensuring more secure services.

Sentinel: generating GUI tests for Android sensor leaks

Due to the widespread use of Android devices and apps, it is important to develop tools and techniques to improve app quality and performance. Our work focuses on a problem related to hardware sensors on Android devices: the failure to disable unneeded sensors, which leads to sensor leaks and thus battery drain. We propose the Sentinel testing tool to uncover such leaks. The tool performs static analysis of app code and produces a model which maps GUI events to callback methods that affect sensor behavior. The model is traversed to identify paths that are likely to exhibit sensor leaks during run-time execution. The reported paths are then used to generate test cases. The execution of each test case tracks the run-time behavior of sensors and reports observed leaks. Our experimental results indicate that Sentinel effectively detects sensor leaks, while focusing the testing efforts on a very small subset of possible GUI event sequences.

On the effectiveness of random testing for Android: or how i learned to stop worrying and love the monkey

Random testing of Android apps is attractive due to ease-of-use and scalability, but its effectiveness could be questioned. Prior studies have shown that Monkey - a simple approach and tool for random testing of Android apps - is surprisingly effective, "beating" much more sophisticated tools by achieving higher coverage. We study how Monkey's parameters affect code coverage (at class, method, block, and line levels) and set out to answer several research questions centered around improving the effectiveness of Monkey-based random testing in Android, and how it compares with manual exploration. First, we show that random stress testing via Monkey is extremely efficient (85 seconds on average) and effective at crashing apps, including 15 widely-used apps that have millions (or even billions) of installs. Second, we vary Monkey's event distribution to change app behavior and measured the resulting coverage. We found that, except for isolated cases, altering Monkey's default event distribution is unlikely to lead to higher coverage. Third, we manually explore 62 apps and compare the resulting coverages; we found that coverage achieved via manual exploration is just 2--3% higher than that achieved via Monkey exploration. Finally, our analysis shows that coarse-grained coverage is highly indicative of fine-grained coverage, hence coarse-grained coverage (which imposes low collection overhead) hits a performance vs accuracy sweet spot.

SESSION: Keynote 2

Towards software-defined and self-driving cloud infrastructure: extended abstract

Traditionally, people abstract away the infrastructure operation, such as power management, network traffic engineering and even the "cloud computing" layers from software developers. This abstraction brings easier application development and maintenance but leads to complexities and inefficiencies for infrastructure operation. Now in the data center community, people have adopted the software-defined paradigm, hoping to bring more flexibility to data center infrastructure, to improve the performance, efficiency, and reliability of the resource-demanding applications.

SESSION: System testing

Improving continuous integration with similarity-based test case selection

Automated testing is an essential component of Continuous Integration (CI) and Delivery (CD), such as scheduling automated test sessions on overnight builds. That allows stakeholders to execute entire test suites and achieve exhaustive test coverage, since running all tests is often infeasible during work hours, i.e., in parallel to development activities. On the other hand, developers also need test feedback from CI servers when pushing changes, even if not all test cases are executed. In this paper we evaluate similarity-based test case selection (SBTCS) on integration-level tests executed on continuous integration pipelines of two companies. We select test cases that maximise diversity of test coverage and reduce feedback time to developers. Our results confirm existing evidence that SBTCS is a strong candidate for test optimisation, by reducing feedback time (up to 92% faster in our case studies) while achieving full test coverage using only information from test artefacts themselves.

Memory corruption detecting method using static variables and dynamic memory usage

Memory fault detection has been continuously studied and various detection methods exist. However, there are still remains many memory defects that are difficult to debug. Memory corruption is one of those defects that often cause a system crash. However, there are many cases where the location of the crash is different from the actual location causing the actual memory corruption. These defects are difficult to solve by existing methods.

In this paper, we propose a method to detect real time memory defects by using static global variables derived from execution binary file and dynamic memory usage obtained by tracing memory related functions. We implemented the proposed method as a tool and applied it to the application running on the IoTivity platform. Our tool detects defects very accurately with low overhead even for those whose detected location and the location of its cause are different.

Guided test case generation through AI enabled output space exploration

Black-box software testing is a crucial part of quality assurance for industrial products. To verify the reliable behavior of software intensive systems, testing needs to ensure that the system produces the correct outputs from a variety of inputs. Even more critical, it needs to ensure that unexpected corner cases are tested. Existing approaches attempt to address this problem by the generation of input data to known outputs based on the domain knowledge of an expert. Such input space exploration, however, does not guarantee an adequate coverage of the output space as the test input data generation is done independently of the system output. The paper discusses a novel test case generation approach enabled by neural networks which promises higher probability of exposing system faults by systematically exploring the output space of the system under test. As such, the approach potentially improves the defect detection capability by identifying gaps in the test suite of uncovered system outputs. These gaps are closed by automatically determining inputs that lead to specific outputs by performing backward reasoning on an artificial neural network. The approach is demonstrated on an industrial train control system.

SESSION: Mutation-based testing

Using controlled numbers of real faults and mutants to empirically evaluate coverage-based test case prioritization

Used to establish confidence in the correctness of evolving software, regression testing is an important, yet costly, task. Test case prioritization enables the rapid detection of faults during regression testing by reordering the test suite so that effective tests are run as early as is possible. However, a distinct lack of information about the regression faults found in complex real-world software forced prior experimental studies of these methods to use artificial faults called mutants. Using the Defects4J database of real faults, this paper presents the results of experiments evaluating the effectiveness of four representative test prioritization techniques. Since this paper's results show that prioritization is susceptible to high amounts of variance when only one fault is present, our experiments also control the number of real faults and mutants in the program subject to regression testing. Our overall findings are that, in comparison to mutants, real faults are harder for reordered test suites to quickly detect, suggesting that mutants are not a surrogate for real faults.

Test suite reduction for self-organizing systems: a mutation-based approach

We study regression testing and test suite reduction for self-organizing (SO) systems. The complex environments of SO systems typically require large test suites. The physical distribution of their components and their history-dependent behavior, however, make test execution very expensive. Consequently, an efficient test suite reduction mechanism is needed. The fundamental characteristic of SO systems is their ability to reconfigure themselves. We thus investigate a mutation-based approach concentrating on reconfigurations, more specifically the communication between the distributed components in reconfigurations. Due to distribution, we argue for an explicit consideration of higher-order mutants and find a short-cut that makes the number of test cases to execute before reduction feasible. For the reduction task, we evaluate the applicability of two existing clustering techniques, Affinity Propagation and Dissimilarity-based Sparse Subset Selection. It turns out that these techniques are able to drastically reduce the original test suite while retaining a good mutation score. We discuss the approach by means of a test suite for a self-organizing production cell as a running example.