Metamorphic testing is a popular testing technique that has shown to be effective at detecting faults in numerous domains such as web services and autonomous vehicles. Despite the many advances made in the last two decades, however, metamorphic testing is still a fertile soil for new contributions. This talk will provide an overview of the current state of the discipline and some of the key challenges to be addressed from three different perspectives: the technique, its applications, and the research community. The speech and the subsequent discussion aims to provide the audience with a common view of the field and the work to be done, paving the way for new promising contributions.
Metamorphic testing is a well known approach to tackle the oracle problem in software testing. This technique requires the use of source test cases that serve as seeds for the generation of follow-up test cases. Systematic design of test cases is crucial for the test quality. Thus, source test case generation strategy can make a big impact on the fault detection effectiveness of metamorphic testing. Most of the previous studies on metamorphic testing have used either random test data or existing test cases as source test cases. There has been limited research done on systematic source test case generation for metamorphic testing. This paper provides a comprehensive evaluation on the impact of source test case generation techniques on the fault finding effectiveness of metamorphic testing. We evaluated the effectiveness of line coverage, branch coverage, weak mutation and random test generation strategies for source test case generation. The experiments are conducted with 77 methods from 4 open source code repositories. Our results show that by systematically creating source test cases, we can significantly increase the fault finding effectiveness of metamorphic testing. Further, in this paper we introduce a simple metamorphic testing tool called "METtester" that we use to conduct metamorphic testing on these methods.
Matrices often represent important information in scientific applications and are involved in performing complex calculations. But systematically testing these applications is hard due to the oracle problem. Metamorphic testing is an effective approach to test such applications because it uses metamorphic relations to determine whether test cases have passed or failed. Metamorphic relations are typically identified with the help of a domain expert and is a labor intensive task. In this work we use a graph kernel based machine learning approach to predict metamorphic relations for matrix calculation programs. Previously, this graph kernel based machine learning approach was used to successfully predict metamorphic relations for programs that perform numerical calculations. Results of this study show that this approach can be used to predict metamorphic relations for matrix calculation programs as well.
Software testing is difficult to automate, especially in programs which have no oracle, or method of determining which output is correct. Metamorphic testing is a solution this problem. Metamorphic testing uses metamorphic relations to define test cases and expected outputs. A large amount of time is needed for a domain expert to determine which metamorphic relations can be used to test a given program. Metamorphic relation prediction removes this need for such an expert. We propose a method using semi-supervised machine learning to detect which metamorphic relations are applicable to a given code base. We compare this semi-supervised model with a supervised model, and show that the addition of unlabeled data improves the classification accuracy of the MR prediction model.
Today's software systems are increasingly required to be flexible which is achieved by providing various forms of loose coupling and configuration options. While loose coupling and configuration options facilitate quick adaptation to changing requirements, such flexibility increases the difficulty of system testing. It is often relatively straight-forward to create different configuration options as test cases, but it is typically much more difficult to formulate the expected system behavior, which is known as the oracle problem. NASA's GMSEC software bus is such a flexible system that serves as a central communication channel for software components based on a publish and subscribe architecture where several software components can be dynamically connected to the system. To cope with the difficulties in testing such a flexible software system, we present a metamorphic testing approach that explicitly addresses the test oracle problem. In this paper, we focus on testing the publish and subscribe functionality of GMSEC motivated by the fact that its middleware-based system architecture is the foundation of many of NASA's missions.
Bioinformatics software plays a very important role in making critical decisions within many areas including medicine and health care. However, most of the research is directed towards developing tools, and little time and effort is spent on testing the software to assure its quality. In testing, a test oracle is used to determine whether a test is passed or failed during testing, and unfortunately, for much of bioinformatics software, the exact expected outcomes are not well defined. Thus, the main challenge associated with conducting systematic testing on bioinformatics software is the oracle problem.
Metamorphic testing (MT) is a technique used to test programs that face the oracle problem. MT uses metamorphic relations (MRs) to determine whether a test has passed or failed and specifies how the output should change according to a specific change made to the input. In this work, we use MT to test LingPipe, a tool for processing text using computational linguistics, often used in bioinformatics for bio-entity recognition from biomedical literature.
First, we identify a set of MRs for testing any bio-entity recognition program. Then we develop a set of test cases that can be used to test LingPipe's bio-entity recognition functionality using these MRs. To evaluate the effectiveness of this testing process, we automatically generate a set of faulty versions of LingPipe. According to our analysis of the experimental results, we observe that our MRs can detect the majority of these faulty versions, which shows the utility of this testing technique for quality assurance of bioinformatics software.
With the growing popularity of machine translation services, it has become increasingly important to be able to assess their quality. However, the test oracle problem makes it difficult to conduct automated testing. In this paper, we propose a Monte Carlo method, in combination with metamorphic testing, to overcome the oracle problem. Using this method, we assessed the quality of three popular machine translation services --- namely, Google Translate, Microsoft Translator, and Youdao Translate. We set the source language to be English, and the target languages included Chinese, French, Japanese, Korean, Portuguese, Russian, Spanish, and Swedish. A sample of 33,600 observations (involving a total of 100,800 actual translations) was collected and analyzed using a 3 X 56 factorial design. Based on this data, our model found Google Translate to be the best (in terms of the metamorphic relation used) for each and every target language considered. A trend for Indo-European languages producing better results was also identified.
We report on a novel use of metamorphic relations (MRs) in machine learning: instead of conducting metamorphic testing, we use MRs for the augmentation of the machine learning algorithms themselves. In particular, we report on how MRs can enable enhancements to an image classification problem of images containing hidden visual markers ("Artcodes").
Working on an original classifier, and using the characteristics of two different categories of images, two MRs, based on separation and occlusion, were used to improve the performance of the classifier. Our experimental results show that the MR-augmented classifier achieves better performance than the original classifier, algorithms, and extending the use of MRs beyond the context of software testing.