ICSE-NIER '18- Proceedings of the 40th International Conference on Software Engineering: New Ideas and Emerging Results

Full Citation in the ACM Digital Library

SESSION: Security, safety, and quality

Generative secure design, defined

In software-intensive industries, companies face the constant challenge of not having enough security experts on staff in order to validate the design of the high-complexity projects they run. Many of these companies are now realizing that increasing automation in their secure development process is the only way forward in order to cope with the ultra-large scale of modern systems. This paper embraces that viewpoint. We chart the roadmap to the development of a generative design tool that iteratively produces several design alternatives, each attempting to solve the security goals by incorporating security mechanisms. The tool explores the possible solutions by starting from well-known security techniques and by creating variations via mutations and crossovers. By incorporating user feedback, the tool generates increasingly better design alternatives.

Towards secure dynamic product lines in the cloud

Cloud-based technologies play an increasing role in software engineering because of their scalability, availability, and cost efficiency. However, due to privacy issues, developers and organizations still hesitate to host applications that handle sensitive data on servers of external cloud providers. Modern hardware extensions, such as Intel's Software Guard Extensions (SGX), are an attempt to provide confidentiality and integrity for applications running on external hardware. Still, enabling SGX in cloud systems poses new challenges considering scalability and flexibility. In this paper, we propose an approach to address these issues by employing concepts from the domain of Dynamic Software Product Lines (DSPLs). We aim to enable applications running on SGX-based cloud systems to be securely reconfigurable and extendable during runtime. In particular, we describe properties that such an approach should fulfill and discuss corresponding challenges.

Towards forensic-ready software systems

As software becomes more ubiquitous, and the risk of cyber-crimes increases, ensuring that software systems are forensic-ready (i.e., capable of supporting potential digital investigations) is critical. However, little or no attention has been given to how well-suited existing software engineering methodologies and practices are for the systematic development of such systems. In this paper, we consider the meaning of forensic readiness of software, define forensic readiness requirements, and highlight some of the open software engineering challenges in the face of forensic readiness. We use a real software system developed to investigate online sharing of child abuse media to illustrate the presented concepts.

Measure confidence of assurance cases in safety-critical domains

Evaluation of assurance cases typically requires certifiers' domain knowledge and experience, and, as such, most software certification has been conducted manually. Given the advancement in uncertainty theories and software traceability, we envision that these technologies can synergistically be combined and leveraged to offer some degree of automation to improve the certifiers' capability to perform software certification. To this end, we present a novel confidence calculation framework that 1) applies the Dempster-Shafer theory as a mathematical model to calculate the confidence between a parent claim and its children claims; and 2) uses the vector space model to evaluate the confidence for the evidence items using traceability information. A fragment of an assurance case (expressed in the goal-structuring notation - GSN) for the coupled tank system is used to illustrate our new framework.

A critical review of: "a practical guide to select quality indicators for assessing pareto-based search algorithms in search-based software engineering": essay on quality indicator selection for SBSE

This paper presents a critical review of the work published at ICSE'2016 on a practical guide of quality indicator selection for assessing multiobjective solution sets in search-based software engineering (SBSE). This review has two goals. First, we aim at explaining why we disagree with the work at ICSE'2016 and why the reasons behind this disagreement are important to the SBSE community. Second, we aim at providing a more clarified guide of quality indicator selection, serving as a new direction on this particular topic for the SBSE community. In particular, we argue that it does matter which quality indicator to select, whatever in the same quality category or across different categories. This claim is based upon the fundamental goal of multiobjective optimisation --- supplying the decision-maker a set of solutions which are the most consistent with their preferences.

Enabling real-time feedback in software engineering

Modern software projects consist of more than just code: teams follow development processes, the code runs on servers or mobile phones and produces run time logs and users talk about the software in forums like StackOverflow and Twitter and rate it on app stores. Insights stemming from the real-time analysis of combined software engineering data can help software practitioners to conduct faster decision-making. With the development of CodeFeedr, a Real-time Software Analytics Platform, we aim to make software analytics a core feedback loop for software engineering projects. CodeFeedr's vision entails: (1) The ability to unify archival and current software analytics data under a single query language, and (2) The feasibility to apply new techniques and methods for high-level aggregation and summarization of near real-time information on software development. In this paper, we outline three use cases where our platform is expected to have a significant impact on the quality and speed of decision making; dependency management, productivity analytics, and run-time error feedback.

SESSION: Programming and code analysis

Combining spreadsheet smells for improved fault prediction

Spreadsheets are commonly used in organizations as a programming tool for business-related calculations and decision making. Since faults in spreadsheets can have severe business impacts, a number of approaches from general software engineering have been applied to spreadsheets in recent years, among them the concept of code smells. Smells can in particular be used for the task of fault prediction. An analysis of existing spreadsheet smells, however, revealed that the predictive power of individual smells can be limited. In this work we therefore propose a machine learning based approach which combines the predictions of individual smells by using an AdaBoost ensemble classifier. Experiments on two public datasets containing real-world spreadsheet faults show significant improvements in terms of fault prediction accuracy.

Images of code: lossy compression for native instructions

Developers can use lossy compression on images and many other artifacts to reduce size and improve network transfer times. Native program instructions, however, are typically not considered candidates for lossy compression since arbitrary losses in instructions may dramatically affect program output. In this paper we show that lossy compression of compiled native instructions is possible in certain circumstances. We demonstrate that the instructions sequence of a program can be lossily translated into a separate but equivalent program with instruction-wise differences, which still produces the same output. We contribute the novel insight that it is possible to exploit such instruction differences to design lossy compression schemes for native code. We support this idea with sound and unsound program transformations that improve performance of compression techniques such as Run-Length (RLE), Huffman and LZ77. We also show that large areas of code can endure tampered instructions with no impact on the output, a result consistent with previous works from various communities.

Hierarchical learning of cross-language mappings through distributed vector representations for code

Translating a program written in one programming language to another can be useful for software development tasks that need functionality implementations in different languages. Although past studies have considered this problem, they may be either specific to the language grammars, or specific to certain kinds of code elements (e.g., tokens, phrases, API uses). This paper proposes a new approach to automatically learn cross-language representations for various kinds of structural code elements that may be used for program translation. Our key idea is two folded: First, we normalize and enrich code token streams with additional structural and semantic information, and train cross-language vector representations for the tokens (a.k.a. shared embeddings based on word2vec, a neural-network-based technique for producing word embeddings; Second, hierarchically from bottom up, we construct shared embeddings for code elements of higher levels of granularity (e.g., expressions, statements, methods) from the embeddings for their constituents, and then build mappings among code elements across languages based on similarities among embeddings.

Our preliminary evaluations on about 40,000 Java and C# source files from 9 software projects show that our approach can automatically learn shared embeddings for various code elements in different languages and identify their cross-language mappings with reasonable Mean Average Precision scores. When compared with an existing tool for mapping library API methods, our approach identifies many more mappings accurately. The mapping results and code can be accessed at https://github.com/bdqnghi/hierarchical-programming-language-mapping. We believe that our idea for learning cross-language vector representations with code structural information can be a useful step towards automated program translation.

Which library should I use?: a metric-based comparison of software libraries

Software libraries ease development tasks by allowing client developers to reuse code written by third parties. To perform a specific task, there is usually a large number of libraries that offer the desired functionality. Unfortunately, selecting the appropriate library to use is not straightforward since developers are often unaware of the advantages and disadvantages of each library, and may also care about different characteristics in different situations. In this paper, we introduce the idea of using software metrics to help developers choose the libraries most suited to their needs. We propose creating library comparisons based on several metrics extracted from multiple sources such as software repositories, issue tracking systems, and Q&A websites. By consolidating all of this information in a single website, we enable developers to make informed decisions by comparing metric data belonging to libraries from several domains. Additionally, we will use this website to survey developers about which metrics are the most valuable to them, helping us answer the broader question of what determines library quality. In this short paper, we describe the metrics we propose in our work and present preliminary results, as well as faced challenges.

Unicomp: a semantics-aware model compiler for optimised predictable software

In Model-Driven Engineering, executables are generated from domain-specific modelling languages (DSMLs) through two steps: generation of program code in a third-generation programming languages (3GLs, like C++ or Java) from a model, and compilation of the generated code to object code. 3GL code generation raises three issues. (1) Code generators are DSML- and 3GL-specific, hence they can not be used for other DSMLs or 3GLs than those they were designed for. (2) Existing code generators do not exploit model semantics; hence, 3GL programs do not always semantically reflect models. (3) Existing 3GL compilers are unable to exploit model semantics; hence, they are not able to operate model-specific optimisations. (2) and (3) seriously threaten predictability of the generated executables.

We advocate the need and provides a solution proposal for an innovative model compilation framework based on model semantics to produce executables without translations to 3GLs. Model compilation will be based on a common semantics, the Semantics of a Foundational Subset for Executable UML Models (fUML), and will semantically underpin any DSML whose execution semantics can be specified with fUML.

Self-adaptive static analysis

Static code analysis is a powerful approach to detect quality deficiencies such as performance bottlenecks, safety violations or security vulnerabilities already during a software system's implementation. Yet, as current software systems continue to grow, current static-analysis systems more frequently face the problem of insufficient scalability. We argue that this is mainly due to the fact that current static analyses are implemented fully manually, often in general-purpose programming languages such as Java or C, or in declarative languages such as Datalog. This design choice predefines the way in which the static analysis evaluates, and limits the optimizations and extensions static-analysis designers can apply.

To boost scalability to a new level, we propose to fuse static-analysis with just-in-time-optimization technology, introducing for the first time static analyses that are managed and inherently self-adaptive. Those analyses automatically adapt themselves to yield a performance/precision tradeoff that is optimal with respect to the analyzed software system and to the analysis itself.

Self-adaptivity is enabled by the novel idea of designing a dedicated intermediate representation, not for the analyzed program but for the analysis itself. This representation allows for an automatic optimization and adaptation of the analysis code, both ahead-of-time (through static analysis of the static analysis) as well as just-in-time during the analysis' execution, similar to just-in-time compilers.

SESSION: Mining, verifying, and learning

Mining container image repositories for software configuration and beyond

This paper introduces the idea of mining container image repositories for configuration and other deployment information of software systems. Unlike traditional software repositories (e.g., source code repositories and app stores), image repositories encapsulate the entire execution ecosystem for running target software, including its configurations, dependent libraries and components, and OS-level utilities, which contributes to a wealth of data and information. We showcase the opportunities based on concrete software engineering tasks that can benefit from mining image repositories. To facilitate future mining efforts, we summarize the challenges of analyzing image repositories and the approaches that can address these challenges. We hope that this paper will stimulate exciting research agenda of mining this emerging type of software repositories.

Explainable software analytics

Software analytics has been the subject of considerable recent attention but is yet to receive significant industry traction. One of the key reasons is that software practitioners are reluctant to trust predictions produced by the analytics machinery without understanding the rationale for those predictions. While complex models such as deep learning and ensemble methods improve predictive performance, they have limited explainability. In this paper, we argue that making software analytics models explainable to software practitioners is as important as achieving accurate predictions. Explainability should therefore be a key measure for evaluating software analytics models. We envision that explainability will be a key driver for developing software analytics models that are useful in practice. We outline a research roadmap for this space, building on social science, explainable artificial intelligence and software engineering.

Generalizing specific-instance interpolation proofs with SyGuS

Proving correctness1 of programs is a challenging task, and consequently has been the focus of a lot of research. One way to break this problem down is to look at one execution path of the program, argue for its correctness, and see if the argument extends to the entire program. However, that may not often be the case, i.e. the proof of a given instance can be overly specific. In this paper, we propose a technique to generalize from such specific-instance proofs, to derive a correctness argument for the entire program. The individual proofs are obtained from an off-the-shelf interpolating prover, and we use Syntax-Guided Synthesis (SyGuS) to generalize the facts that constitute those proofs. Our initial experiment with a prototype tool shows that there is a lot of scope to guide the generalization engine to converge to a proof very quickly.

Efficient parametric model checking using domain-specific modelling patterns

We propose a parametric model checking (PMC) method that enables the efficient analysis of quality-of-service (QoS) properties of component-based systems. Our method builds on recent advances in PMC techniques and tools, and can handle large models by exploiting domain-specific modelling patterns for the software components. We precompute closed-form expressions for key QoS properties of such patterns, and handle system-level PMC by combining these expressions into easy-to-evaluate systems of equations.

Deep learning UI design patterns of mobile apps

User interface (UI) is one of the most important components of a mobile app and strongly influences users' perception of the app. However, UI design tasks are typically manual and time-consuming. This paper proposes a novel approach to (semi)-automate those tasks. Our key idea is to develop and deploy advanced deep learning models based on recurrent neural networks (RNN) and generative adversarial networks (GAN) to learn UI design patterns from millions of currently available mobile apps. Once trained, those models can be used to search for UI design samples given user-provided descriptions written in natural language and generate professional-looking UI designs from simpler, less elegant design drafts.

Code review comments: language matters

Recent research provides evidence that effective communication in collaborative software development has significant impact on the software development lifecycle. Although related qualitative and quantitative studies point out textual characteristics of well-formed messages, the underlying semantics of the intertwined linguistic structures still remain largely misinterpreted or ignored. Especially, regarding quality of code reviews the importance of thorough feedback, and explicit rationale is often mentioned but rarely linked with related linguistic features. As a first step towards addressing this shortcoming, we propose grounding these studies on theories of linguistics. We particularly focus on linguistic structures of coherent speech and explain how they can be exploited in practice. We reflect on related approaches and examine through a preliminary study on four open source projects, possible links between existing findings and the directions we suggest for detecting textual features of useful code reviews.

SESSION: Empirical studies and requirements

Replication studies considered harmful

Context: There is growing interest in establishing software engineering as an evidence-based discipline. To that end, replication is often used to gain confidence in empirical findings, as opposed to reproduction where the goal is showing the correctness, or validity of the published results.

Objective: To consider what is required for a replication study to confirm the original experiment and apply this understanding in software engineering.

Method: Simulation is used to demonstrate why the prediction interval for confirmation can be surprisingly wide. This analysis is applied to three recent replications.

Results: It is shown that because the prediction intervals are wide, almost all replications are confirmatory, so in that sense there is no `replication crisis', however, the contributions to knowledge are negligible.

Conclusion: Replicating empirical software engineering experiments, particularly if they are under-powered or under-reported, is a waste of scientific resources. By contrast, meta-analysis is strongly advocated so that all relevant experiments are combined to estimate the population effect.

From craft to science: the road ahead for empirical software engineering research

Empirical software engineering (SE) research is often criticized for poorly designed and reported studies, a lack of replications to build up bodies of knowledge, and little practical relevance. In this paper, we discuss issues in empirical software architecture research as an illustration of these issues in one subfield of SE and as a step towards better understanding empirical research in SE in general. Based on feedback from software architecture researchers and practitioners, we explore why, despite persistent discussions in the SE research community, there are still disagreements about why and how to conduct empirical research. Then, we explore how empirical SE research can progress beyond "one-off studies and endless "new and exciting" results toward SE research as a mature science. This would allow us to establish foundations for evaluating existing and future empirical research and help researchers design and publish better studies.

Towards saving money in using smart contracts

Being a new kind of software leveraging blockchain to execute real contracts, smart contracts are in great demand due to many advantages. Ethereum is the largest blockchain platform that supports smart contracts by running them in its virtual machine. To ensure that a smart contract will terminate eventually and prevent abuse of resources, Ethereum charges the developers for deploying smart contracts and the users for executing smart contracts. Although our previous work shows that under-optimized smart contracts may cost more money than necessary, it just lists 7 anti-patterns and the detection method for 3 of them. In this paper, we conduct the first in-depth investigation on such under-optimized smart contracts. We first identify 24 anti-patterns from the execution traces of real smart contracts. Then, we design and develop GasReducer, the first tool to automatically detect all these anti-patterns from the bytecode of smart contracts and replace them with efficient code through bytecode-to-bytecode optimization. Using GasReducer to analyze all smart contracts and their execution traces, we detect 9,490,768 and 557,565,754 anti-pattern instances in deploying and invoking smart contracts, respectively.

Understanding the impact of pair programming on the minds of developers

Software is mostly, if not entirely, a knowledge artifact. Software best practices are often thought to work because they induce more productive behaviour in software developers. In this paper we deployed a new generation tool, portable multichannel EEG, to obtain direct physical insight into the mental processes of working software developers engaged in their standard activities. We have demonstrated the feasibility of this approach and obtained a glimpse of its potential power to distinguish physical brain activity of developers working with different methodologies.

Retrospective based on data-driven persona significance in B-to-B software development

A Business-to-Business (B-to-B) software development company develops services to satisfy their customers' requirements. Developers should prioritize customer satisfaction because customers greatly influence on agile software development. However, it is possible that a B-to-B software development company has following issues: 1) failure to understand actual users because the requirements are not often derived from actual users and 2) failure to satisfy the future customers' requirements when only satisfying current customers. Although many previous works proposed methods to elicit the requirements based on actual quantitative data, these works had not considered customers and end-users simultaneously. Herein we proposed Retrospective based on Data-Driven Persona Significance (ReD2PS) to help developers to plan future releases. ReD2PS includes Persona Significance Index (PerSil) to reflect the correspondence between target users, which developers assume based on requirements in releases, and end-users' personas. A case study involving a Japanese cloud application shows that PerSil reflects the relationship between target users and end-users to discuss about the validity and effectiveness of ReD2PS.

Dazed: measuring the cognitive load of solving technical interview problems at the whiteboard

Problem-solving on a whiteboard is a popular technical interview technique used in industry. However, several critics have raised concerns that whiteboard interviews can cause excessive stress and cognitive load on candidates, ultimately reinforcing bias in hiring practices. Unfortunately, many sensors used for measuring cognitive state are not robust to movement. In this paper, we describe an approach where we use a head-mounted eye-tracker and computer vision algorithms to collect robust metrics of cognitive state. To demonstrate the feasibility of the approach, we study two proposed interview settings: on the whiteboard and on paper with 11 participants. Our preliminary results suggest that the whiteboard setting pressures candidates into keeping shorter attention lengths and experiencing higher levels of cognitive load compared to solving the same problems on paper. For instance, we observed 60ms shorter fixation durations and 3x more regressions when solving problems on the whiteboard. Finally, we describe a vision for creating a more inclusive technical interview process through future studies of interventions that lower cognitive load and stress.

SESSION: Software engineering in other domains

Deep customization of multi-tenant SaaS using intrusive microservices

Enterprise software needs to be customizable, and the customization needs from a customer are often beyond what the software vendor can predict in advance. In the on-premises era, customers do deep customizations beyond vendor's prediction by directly modifying the vendor's source code and then build and operate it on their own premises. When enterprise software is moving to cloud-based multi-tenant SaaS (Software as a Service), it is no longer possible for customers to directly modify the vendor's source code, because the same instance of code is shared by multiple customers at runtime. Therefore, the question is whether it is still possible to do deep customization on multi-tenant SaaS. In this paper, we give an answer to this question with a novel architecture style to realize deep customization of SaaS using intrusive microservices. We evaluate the approach on an open source online commercial system, and discuss the further research questions to make deep customization applicable in practice.

Software ecosystem call graph for dependency management

A popular form of software reuse is the use of open source software libraries hosted on centralized code repositories, such as Maven or npm. Developers only need to declare dependencies to external libraries, and automated tools make them available to the workspace of the project. Recent incidents, such as the Equifax data breach and the leftpad package removal, demonstrate the difficulty in assessing the severity, impact and spread of bugs in dependency networks. While dependency checkers are being adapted as a counter measure, they only provide indicative information. To remedy this situation, we propose a fine-grained dependency network that goes beyond packages and into call graphs. The result is a versioned ecosystem-level call graph. In this paper, we outline the process to construct the proposed graph and present a preliminary evaluation of a security issue from a core package to an affected client application.

An immersive future for software engineering: avenues and approaches

Software systems are increasingly becoming more intricate and complex, necessitating new ways to be able to comprehend and visualize them. At the same time, the nature of software engineering teams itself is changing with people playing more fluid roles often needing seamless and contextual intelligence, for faster and better decisions. Moreover, the next-generation of software engineers will all be post-millennials, which may have totally different expectations from their software engineering workplace. Thus, we believe that it is important to have a re-look at the way we traditionally do software engineering and immersive technologies have a huge potential here to help out with such challenges. However, while immersive technologies, devices and platforms, have matured in past few years, there has been very little research on studying how these technologies can influence software engineering. In this paper, we introduce how traditional software engineering can leverage immersive approaches for building, delivering and maintaining next-generation software applications. As part of our initial research, we present an augmented-reality based prototype for project managers, which provides contextual and immersive insights. Finally, we also discuss important research questions that we are investigating further as part of our immersive software engineering research.

Dronology: an incubator for cyber-physical systems research

Research in the area of Cyber-Physical Systems (CPS) is hampered by the lack of available project environments in which to explore open challenges and to propose and rigorously evaluate solutions. In this "New Ideas and Emerging Results" paper we introduce a CPS research incubator - based upon a system, and its associated project environment, for managing and coordinating the flight of small Unmanned Aerial Systems (sUAS). The research incubator provides a new community resource, making available diverse, high-quality project artifacts produced across multiple releases of a safety-critical CPS. It enables researchers to experiment with their own novel solutions within a fully-executable runtime environment that supports both high-fidelity sUAS simulations as well as physical sUAS. Early collaborators from the software engineering community have shown broad and enthusiastic support for the project and its role as a research incubator, and have indicated their intention to leverage the environment to address their own research areas of goal modeling, runtime adaptation, safety-assurance, and software evolution.