RAISE '18- Proceedings of the 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering

Full Citation in the ACM Digital Library

SESSION: Natural language and text data

Integrating a dialog component into a framework for spoken language understanding

Spoken language interfaces are the latest trend in human computer interaction. Users enjoy the newly found freedom but developers face an unfamiliar and daunting task. Creating reactive spoken language interfaces requires skills in natural language processing. We show how a developer can integrate a dialog component in a natural language processing system by means of software engineering methods. Our research project PARSE that aims at naturalistic end-user programming in spoken natural language serves as an example. We integrate a dialog component with PARSE without affecting its other components: We modularize the dialog management and introduce dialog acts that bundle a trigger for the dialog and the reaction of the system. We implemented three dialog acts to address the following issues: speech recognition uncertainties, coreference ambiguities, and incomplete conditionals.

We conducted a user study with ten subjects to evaluate our approach. The dialog component achieved resolution rates from 23% to 50% (depending on the dialog act) and introduces a negligible number of errors. We expect the overall performance to increase even further with the implementation of additional dialog acts.

Exploring the benefits of utilizing conceptual information in test-to-code traceability

Striving for reliability of software systems often results in immense numbers of tests. Due to the lack of a generally used annotation, finding the parts of code these tests were meant to assess can be a demanding task. This is a valid problem of software engineering called test-to-code traceability. Recent research on the subject has attempted to cope with this problem applying various approaches and their combinations, achieving profound results. These approaches have involved the use of naming conventions during development processes and also have utilized various information retrieval (IR) methods often referred to as conceptual information. In this work we investigate the benefits of textual information located in software code and its value for aiding traceability. We evaluated the capabilities of the natural language processing technique called Latent Semantic Indexing (LSI) in the view of the results of the naming conventions technique on five real, medium sized software systems. Although LSI is already used for this purpose, we extend the viewpoint of one-to-one traceability approach to the more versatile view of LSI as a recommendation system. We found that considering the top 5 elements in the ranked list increases the results by 30% on average and makes LSI a viable alternative in projects where naming conventions are not followed systematically.

Complementing machine learning classifiers via dynamic symbolic execution: "human vs. bot generated" tweets

Recent machine learning approaches for classifying text as human-written or bot-generated rely on training sets that are large, labeled diligently, and representative of the underlying domain. While valuable, these machine learning approaches ignore programs as an additional source of such training sets. To address this problem of incomplete training sets, this paper proposes to systematically supplement existing training sets with samples inferred via program analysis. In our preliminary evaluation, training sets enriched with samples inferred via dynamic symbolic execution were able to improve machine learning classifier accuracy for simple string-generating programs.

SESSION: Web data and taxonomy

Codecatch: extracting source code snippets from online sources

Nowadays, developers rely on online sources to find example snippets that address the programming problems they are trying to solve. However, contemporary API usage mining methods are not suitable for locating easily reusable snippets, as they provide usage examples for specific APIs, thus requiring the developer to know which library to use beforehand. On the other hand, the approaches that retrieve snippets from online sources usually output a list of examples, without aiding the developer to distinguish among different implementations and without offering any insight on the quality and the reusability of the proposed snippets. In this work, we present CodeCatch, a system that receives queries in natural language and extracts snippets from multiple online sources. The snippets are assessed both for their quality and for their usefulness/preference by the developers, while they are also clustered according to their API calls to allow the developer to select among the different implementations. Preliminary evaluation of CodeCatch in a set of indicative programming problems indicates that it can be a useful tool for the developer.

Semi-automatic generation of active ontologies from web forms for intelligent assistants

Intelligent assistants are becoming widespread. A popular method for creating intelligent assistants is modeling the domain (and thus the assistant's capabilities) as Active Ontology. Adding new functionality requires extending the ontology or building new ones; as of today, this process is manual.

We describe an automated method for creating Active Ontologies for arbitrary web forms. Our approach leverages methods from natural language processing and data mining to synthesize the ontologies. Furthermore, our tool generates the code needed to process user input.

We evaluate the generated Active Ontologies in three case studies using web forms from the UIUC Web Integration Repository, namely from the domains airfare, automobile, and book search. First, we examine how much of the generation process can be automated and how well the approach identifies domain concepts and their relations. Second, we test how well the generated Active Ontologies handle end-user input to perform the desired actions. In our evaluation, Easier automatically generates 65% of the Active Ontologies' sensor nodes; the generated ontology for airfare search correctly answers 70% of the queries.

Ways of applying artificial intelligence in software engineering

As Artificial Intelligence (AI) techniques become more powerful and easier to use they are increasingly deployed as key components of modern software systems. While this enables new functionality and often allows better adaptation to user needs it also creates additional problems for software engineers and exposes companies to new risks. Some work has been done to better understand the interaction between Software Engineering and AI but we lack methods to classify ways of applying AI in software systems and to analyse and understand the risks this poses. Only by doing so can we devise tools and solutions to help mitigate them. This paper presents the AI in SE Application Levels (AI-SEAL) taxonomy that categorises applications according to their point of application, the type of AI technology used and the automation level allowed. We show the usefulness of this taxonomy by classifying 15 papers from previous editions of the RAISE workshop. Results show that the taxonomy allows classification of distinct AI applications and provides insights concerning the risks associated with them. We argue that this will be important for companies in deciding how to apply AI in their software applications and to create strategies for its use.

SESSION: Defect prediction

A replication study: just-in-time defect prediction with ensemble learning

Just-in-time defect prediction, which is also known as change-level defect prediction, can be used to efficiently allocate resources and manage project schedules in the software testing and debugging process. Just-in-time defect prediction can reduce the amount of code to review and simplify the assignment of developers to bug fixes. This paper reports a replicated experiment and an extension comparing the prediction of defect-prone changes using traditional machine learning techniques and ensemble learning. Using datasets from six open source projects, namely Bugzilla, Columba, JDT, Platform, Mozilla, and PostgreSQL we replicate the original approach to verify the results of the original experiment and use them as a basis for comparison for alternatives in the approach. Our results from the replicated experiment are consistent with the original. The original approach uses a combination of data preprocessing and a two-layer ensemble of decision trees. The first layer uses bagging to form multiple random forests. The second layer stacks the forests together with equal weights. Generalizing the approach to allow the use of any arbitrary set of classifiers in the ensemble, optimizing the weights of the classifiers, and allowing additional layers, we apply a new deep ensemble approach, called deep super learner, to test the depth of the original study. The deep super learner achieves statistically significantly better results than the original approach on five of the six projects in predicting defects as measured by F1 score.

Evaluating the adaptive selection of classifiers for cross-project bug prediction

Bug prediction models are used to locate source code elements more likely to be defective. One of the key factors influencing their performances is related to the selection of a machine learning method (a.k.a., classifier) to use when discriminating buggy and non-buggy classes. Given the high complementarity of stand-alone classifiers, a recent trend is the definition of ensemble techniques, which try to effectively combine the predictions of different stand-alone machine learners. In a recent work we proposed ASCI, a technique that dynamically selects the right classifier to use based on the characteristics of the class on which the prediction has to be done. We tested it in a within-project scenario, showing its higher accuracy with respect to the Validation and Voting strategy. In this paper, we continue on the line of research, by (i) evaluating ASCI in a global and local cross-project setting and (ii) comparing its performances with those achieved by a stand-alone and an ensemble baselines, namely Naive Bayes and Validation and Voting, respectively. A key finding of our study shows that ASCI is able to perform better than the other techniques in the context of cross-project bug prediction. Moreover, despite local learning is not able to improve the performances of the corresponding models in most cases, it is able to improve the robustness of the models relying on ASCI.