ICSEW'20: Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops

Full Citation in the ACM Digital Library

WORKSHOP SESSION: 1st International Workshop on Automated Program Repair (APR)

Using API-Embedding for API-Misuse Repair

Application Programming Interfaces (APIs) are a way to reuse existing functionalities of one application in another one. However, due to lacking knowledge on the correct usage of a particular API, developers sometimes commit misuses, causing unintended or faulty behavior. To detect and eventually repair such misuses automatically, inferring API usage patterns from real-world code is the state-of-the-art. A contradiction to an identified usage pattern denotes a misuse, while applying the pattern fixes the respective misuse. The success of this process heavily depends on the quality of the usage patterns and on the code from which these are inferred. Thus, a lack of code demonstrating the correct usage makes it impossible to detect and fix a misuse. In this paper, we discuss the potential of using machine-learning vector embeddings to improve automatic program repair and to extend it towards cross-API and cross-language repair. We illustrate our ideas using one particular technique for API-embedding (i.e., API2Vec) and describe the arising possibilities and challenges.

Selective Symbolic Type-Guided Checkpointing and Restoration for Autonomous Vehicle Repair

Fault tolerant design can help autonomous vehicle systems address defects, environmental changes and security attacks. Checkpoint and restoration fault tolerance techniques save a copy of an application's state before a problem occurs and restore that state afterwards. However, traditional Checkpoint/Restore techniques still admit high overhead, may carry along tainted data, and rarely operate in tandem with human-written or automated repairs that modify source code or alter data layout. Thus, it can be difficult to apply traditional Checkpoint/Restore techniques to solve the issues of non-environmental defects, security attacks or software bugs. To address such challenges, in this paper, we propose and evaluate a selective checkpoint and restore (SCR) technique that records only critical system state based on types and minimal symbolic annotations to deploy repaired programs. We found that using source-level symbolic information allows an application to be resumed even after its code is modified in our evaluation. We evaluate our approach using a commodity autonomous vehicle system and demonstrate that it admits manual and automated software repairs, does not carry tainted data, and has low overhead.

Flake It 'Till You Make It: Using Automated Repair to Induce and Fix Latent Test Flakiness

Since flaky tests pass or fail nondeterministically, without any code changes, they are an unreliable indicator of program quality. Developers may quarantine or delete flaky tests because it is often too time consuming to repair them. Yet, since decommissioning too many tests may ultimately degrade a test suite's effectiveness, developers may eventually want to fix them, a process that is challenging because the nondeterminism may have been introduced previously. We contend that the best time to discover and repair a flaky test is when a developer first creates and best understands it. We refer to tests that are not currently flaky, but that could become so, as having latent flakiness. We further argue that efforts to expose and repair latent flakiness are valuable in ensuring the future-reliability of the test suite, and that the testing cost is greater if latent flakiness is left to manifest itself later. Using concrete examples from a real-world program, this paper posits that automated program repair techniques will prove useful for surfacing latent flakiness.

Refining Fitness Functions in Test-Based Program Repair

Genetic improvement has proved to be a successful technique in optimising various software properties, such as bug fixing, runtime improvement etc. It uses automated search to find improved program variants. Usually the evaluation of each mutated program involves running a test suite, and then calculating the fitness based on Boolean test case results. This, however, creates plateaus in the fitness landscape that are hard for search to efficiently traverse. Therefore, we propose to consider a more fine-grained fitness function that takes the output of test case assertions into account.

Program Repairing History as Git Repository

This paper proposes a concept of introducing Git repository to record a history of program evolution via automated program repair techniques. In contrast to the general usage of Git by actual developers, a Git repository is generated by an APR system. This paper presents that it is feasible to store the history of program repair efficiently and comprehensively by using Git. Moreover, the proposed concept allows to share the details of an APR execution and to compare various APR executions.

Interactive Patch Generation and Suggestion

Automated program repair (APR) is an emerging technique that can automatically generate patches for fixing bugs or vulnerabilities. To ensure correctness, the auto-generated patches are usually sent to developers for verification before applied in the program. To review patches, developers must figure out the root cause of a bug and understand the semantic impact of the patch, which is not straightforward and easy even for expert programmers. In this position paper, we envision an interactive patch suggestion approach that avoids such complex reasoning by instead enabling developers to review patches with a few clicks. We first automatically translate patch semantics into a set of what and how questions. Basically, the what questions formulate the expected program behaviors, while the how questions represent how to modify the program to realize the expected behaviors. We could leverage the existing APR technique to generate those questions and corresponding answers. Then, to evaluate the correctness of patches, developers just need to ask questions and click the corresponding answers.

Learning to Fix Build Errors with Graph2Diff Neural Networks

Impact of Similarity on Repairing Small Programs: A Case Study on QuixBugs Benchmark

Similarity analysis plays an important role in automated program repair by finding the correct solution earlier. However, the effectiveness of similarity is mostly validated using common benchmark Defects4J which consists of 6 large projects. To mitigate the threat of generalizability, this study examines the performance of similarity in repairing small programs. For this purpose, existing syntactic and semantic similarity based approaches, as well as a new technique of combining both similarities, are used. These approaches are evaluated using QuixBugs, a dataset of diverse type bugs from 40 small programs. These techniques fix bugs faster by validating fewer patches than random patch selection based approach. Thus, it proves the effectiveness of similarity in repairing small programs.

Automatic repair of OWASP Top 10 security vulnerabilities: A survey

Current work on automatic program repair has not focused on actually prevalent vulnerabilities in web applications, such as described in the OWASP Top 10 categories, leading to a scarcely explored field, which in turn leads to a gap between industry needs and research efforts. In order to assess the extent of this gap, we have surveyed and analyzed the literature on fully automatic source-code manipulating program repair of OWASP Top 10 vulnerabilities, as well as their corresponding test suites. We find that there is a significant gap in the coverage of the OWASP Top 10 vulnerabilities, and that the test suites used to test the analyzed approaches are highly inadequate. Few approaches cover multiple OWASP Top 10 vulnerabilities, and there is no combination of existing test suites that achieves a total coverage of OWASP Top 10.

WORKSHOP SESSION: 2nd International Workshop on Bots in Software Engineering (BotSE)

Bot or not?: Detecting bots in GitHub pull request activity based on comment similarity

Many empirical studies focus on socio-technical activity in social coding platforms such as GitHub, for example to study the onboarding, abandonment, productivity and collaboration among team members. Such studies face the difficulty that GitHub activity can also be generated automatically by bots of a different nature. It therefore becomes imperative to distinguish such bots from human users. We propose an automated approach to detect bots in GitHub pull request (PR) activity. Relying on the assumption that bots contain repetitive message patterns in their PR comments, we analyse the similarity between multiple messages from the same GitHub identity, using a clustering method that combines the Jaccard and Levenshtein distance. We empirically evaluate our approach by analysing 20,090 PR comments of 250 users and 42 bots in 1,262 GitHub repositories. Our results show that the method is able to clearly separate bots from human users.

MSABot: A Chatbot Framework for Assisting in the Development and Operation of Microservice-Based Systems

Microservice architecture (MSA) has become a popular architectural style. The main advantages of MSA include modularization and scalability. However, the development and maintenance of Microservice-based systems are more complex than traditional monolithic architecture. This research plans to develop a novel Chatbot system, referred to as MSABot (Microservice Architecture Bot), to assist in the development and operation of Microservice-based systems by using Chatbots. MSABot integrates a variety of tools to allow users to understand the current status of Microservice development and operation, and to push the information of system errors or risks to users. For the operators who take over the maintenance of Microservices, MSABot also allows them to quickly understand the overall service architecture and the operation status of each service. Besides, we invited multiple users who are familiar with the technology of Microservice or ChapOps to evaluate MSABot. The results of the survey show that more than 90% of the respondents believe that MSABot can adequately support the development and maintenance of Microservice-based systems.

Challenges and guidelines on designing test cases for test bots

Test bots are automated testing tools that autonomously and periodically run a set of test cases that check whether the system under test meets the requirements set forth by the customer. The automation decreases the amount of time a development team spends on testing. As development projects become larger, it is important to focus on improving the test bots by designing more effective test cases because otherwise time and usage costs can increase greatly and misleading conclusions from test results might be drawn, such as false positives in the test execution. However, literature currently lacks insights on how test case design affects the effectiveness of test bots. This paper uses a case study approach to investigate those effects by identifying challenges in designing tests for test bots. Our results include guidelines for test design schema for such bots that support practitioners in overcoming the challenges mentioned by participants during our study.

Conversational Bot for Newcomers Onboarding to Open Source Projects

This paper targets the problems newcomers face when onboarding to open source projects and the low retention rate of newcomers. Open source software projects are becoming increasingly more popular. Many major companies have started building open source software. Unfortunately, many newcomers only commit once to an open source project before moving on to another project. Even worse, many novices struggle with joining open source communities and end up leaving quickly, sometimes before their first successful contribution. In this paper, we propose a conversational bot that would recommend projects to newcomers and assist in the onboarding to the open source community. The bot would be able to provide helpful resources, such as Stack Overflow related content. It would also be able to recommend human mentors. We believe that this bot would improve newcomers' experience by providing support not only during their first contribution, but by acting as an agent to engage them to the project.

The Inconvenient Side of Software Bots on Pull Requests

Software bots are applications that integrate their work with humans' tasks, serving as conduits between users and other tools. Due to their ability to automate tasks, bots have been widely adopted by Open Source Software (OSS) projects hosted on GitHub. Commonly, OSS projects use bots to automate a variety of routine tasks to save time from maintainers and contributors. Although bots can be useful for supporting maintainers' work, sometimes their comments are seen as spams, and are quickly ignored by contributors. In fact, the way that these bots interact on pull requests can be disruptive and perceived as unwelcoming. In this paper, we propose the concept of a meta-bot to deal with current problems on the human-bot interaction on pull requests. Besides providing additional value to this interaction, meta-bot will reduce interruptions and help maintainers and contributors stay aware of important information.

Sorry to Bother You Again: Developer Recommendation Choice Architectures for Designing Effective Bots

Software robots, or bots, are useful for automating a wide variety of programming and software development tasks. Despite the advantages of using bots throughout the software engineering process, research shows that developers often face challenges interacting with these systems. To improve automated developer recommendations from bots, this work introduces developer recommendation choice architectures. Choice architecture is a behavioral science concept that suggests the presentation of options impacts the decisions humans make. To evaluate the impact of framing recommendations for software engineers, we examine the impact of one choice architecture, actionability, for improving the design of bot recommendations. We present the results of a preliminary study evaluating this choice architecture in a bot and provide implications for integrating choice architecture into the design of future software engineering bots.

An Exploratory Study of Bot Commits

Background: Bots help automate many of the tasks performed by software developers and are widely used to commit code in various social coding platforms. At present, it is not clear what types of activities these bots perform and understanding it may help design better bots, and find application areas which might benefit from bot adoption. Aim: We aim to categorize the Bot Commits by the type of change (files added, deleted, or modified), find the more commonly changed file types, and identify the groups of file types that tend to get updated together. Method: 12,326,137 commits made by 461 popular bots (that made at least 1000 commits) were examined to identify the frequency and the type of files added/ deleted/ modified by the commits, and association rule mining was used to identify the types of files modified together. Result: Majority of the bot commits modify an existing file, a few of them add new files, while deletion of a file is very rare. Commits involving more than one type of operation are even rarer. Files containing data, configuration, and documentation are most frequently updated, while HTML is the most common type in terms of the number of files added, deleted, and modified. Files of the type "Markdown","Ignore List", "YAML", "JSON" were the types that are updated together with other types of files most frequently. Conclusion: We observe that majority of bot commits involve single file modifications, and bots primarily work with data, configuration, and documentation files. A better understanding if this is a limitation of the bots and, if overcome, would lead to different kinds of bots remains an open question.

Experiences Building an Answer Bot for Gitter

Software developers use modern chat platforms to communicate about the status of a project and to coordinate development and release efforts, among other things. Developers also use chat platforms to ask technical questions to other developers. While some questions are project-specific and require an experienced developer familiar with the system to answer, many questions are rather general and may have been already answered by other developers on platforms such as the Q&A site StackOverflow.

In this paper, we present GitterAns, a bot that can automatically detect when a developer asks a technical question in a chat and leverages the information present in Q&A forums to provide the developer with possible answers to their question. The results of a preliminary study indicate promising results, with GitterAns achieving an accuracy of 0.78 in identifying technical questions.

WORKSHOP SESSION: 13th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE)

Two Decades of Empirical Research on Developers' Information Needs: A Preliminary Analysis

Over the last two decades, developers' daily intake of information has been constantly on the rise and so has the interest of research in investigating the information needs of developers. Knowledge about what information they seek and which sources they rely on is scarce and has to be updated regularly to match the rapid changes in development practices. In this paper, we reflect on the scientific studies published in this field over the last two decades. We present preliminary results of our analysis of a study sample where we particularly focus on the research methods used, the number of recruited participants, and the organisational context in which they emerged. We have investigated a total of 54 studies from 41 publications and found that convenience sampling is the predominant sampling strategy with a prevalence of the industrial organisational context. Moreover, the majority of studies had a reduced sample size and draw participants from a single organisation resulting in high sample homogeneity. Among the studies carried out in industry 51.9% recruited participants from Microsoft.

Comparing Different Developer Behavior Recommendation Styles

Research shows that one of the most effective ways software engineers discover useful developer behaviors, or tools and practices designed to help developers complete programming tasks, is through human-to-human recommendations from coworkers during work activities. However, due to the increasingly distributed nature of the software industry and development teams, opportunities for these peer interactions are in decline. To overcome the deprecation of peer interactions in software engineering, we explore the impact of several system-to-human recommendation systems, including the recently introduced suggested changes feature on GitHub which allows users to propose code changes to developers on contributions to repositories, to discover their impact on developer recommendations. In this work, we aim to study the effectiveness of suggested changes for recommending developer behaviors by performing a user study with professional software developers to compare static analysis tool recommendations from emails, pull requests, issues, and suggested changes. Our results provide insight into creating systems for recommendations between developers and design implications for improving automated recommendations to software engineers.

Sensemaking Practices in the Everyday Work of AI/ML Software Engineering

This paper considers sensemaking as it relates to everyday software engineering (SE) work practices and draws on a multi-year ethnographic study of SE projects at a large, global technology company building digital services infused with artificial intelligence (AI) and machine learning (ML) capabilities. Our findings highlight the breadth of sensemaking practices in AI/ML projects, noting developers' efforts to make sense of AI/ML environments (e.g., algorithms/methods and libraries), of AI/ML model ecosystems (e.g., pre-trained models and "upstream" models), and of business-AI relations (e.g., how the AI/ML service relates to the domain context and business problem at hand). This paper builds on recent scholarship drawing attention to the integral role of sensemaking in everyday SE practices by empirically investigating how and in what ways AI/ML projects present software teams with emergent sensemaking requirements and opportunities.

Multitasking Across Industry Projects: A Replication Study

Background: Multitasking is usual in software development. It is the ability to stop working on a task, switch to another, and return eventually to the first one, as needed or as scheduled. Multitasking, however, comes at a cognitive cost: frequent context-switches can lead to distraction, sub-standard work, and even greater stress. Aims: This paper reports a replication experiment where we gathered data on a group of developers from a software development company from industry on a large collection of projects stored in GitLab repositories. Method: We reused the developed models and methods from the original study for measuring the rate and breadth of a developers' context-switching behavior, and we study how context-switching affects their productivity. We applied semi-structured interviews, replacing the original survey, to some of the developers to understand the reasons for and perceptions of multitasking. Results: We found out that industry developers multitask as much as OSS developers focusing more (on fewer projects), and working more repetitively from one day to the next is associated with higher productivity, but there is no effect for higher multitasking. Some commons reasons make them multitask: dependencies, personal interests, and social relationships. Conclusions: Short context change, less than three minutes, did not impact results from industry developers; however, more than that, it brings a feeling of left the previous tasks behind. So, it is proportional to how much context is switched: as bigger the context and bigger the interruption, it is worst to come back.

How Online Forums Complement Task Documentation in Software Crowdsourcing

An issue in software crowdsourcing is the quality of the task documentation and the high number of registered crowd workers to solve tasks but few submitted solutions only. This happens because uncommunicated or misunderstood requirements can lead crowd workers to deliver a solution that does not meet the customers' requirements or, worse, to give up submitting a solution. In this paper, we present an empirical study in which we analyzed task documentation and online forums messages associated with 25 Software Crowdsourcing (SW CS) challenges. The findings corroborate that weak documentation is a challenge in SW CS. Meanwhile, online forums allow crowd workers to gather additional technical and operational information that is not present in the official task documentation. We provide a stepping stone towards understanding the interplay between requirements and communication, to make it possible to improve SW CS development processes, practices, and tools.

Behavior-Driven Development: A case study on its impacts on agile development teams

Software development practices which enhance software quality and help teams better develop collaboratively have received attention by the academic community. Among these techniques is Behavior-Driven Development (BDD), a development method which proposes software to be developed focusing primarily on its expected behavior. In this context, this paper investigates how BDD impacts agile software development teams. In order to achieve this, we have conducted a case study on a mobile application development environment which develops software using agile. In total, 42 interviews were performed. Our results indicate that BDD can have positive impacts, increasing collaboration among team members, and negative impacts, like difficulties in writing unit tests. We concluded that BDD has more positive than negative outcomes.

Building Implicit Vector Representations of Individual Coding Style

We propose a new approach to building vector representations of individual developers by capturing their individual contribution style, or coding style. Such representations can find use in the next generation of software development team collaboration tools, for example by enabling the tools to track knowledge transfer in teams. The key idea of our approach is to avoid using explicitly defined metrics of coding style and instead build the representations through training a model for authorship recognition and extracting the representations of individual developers from the trained model. By empirically evaluating the output of our approach, we find that implicitly built individual representations reflect some properties of team structure: developers who report learning from each other are represented closer to each other.

Strategies for Crowdworkers to Overcome Barriers in Competition-based Software Crowdsourcing Development

Crowdsourcing in software development uses a large pool of developers on-demand to outsource parts or the entire software project to a crowd. To succeed, this requires a continuous influx of developers, or simply crowdworkers. However, crowdworkers face many barriers when attempting to participate in software crowdsourcing. Often, these barriers lead to a low number and poor quality of submitted solutions. In our previous work, we identified several barriers faced by crowdworkers including finding a task according to his/her abilities, setting up the environment to perform the task, and managing one's personal time. We also proposed six strategies to overcome or minimize these barriers. In this paper, these six strategies are evaluated questioning Software Crowdsourcing (SW CS) experts. The results show that software crowdsourcing needs to: (i) provide a system that helps matching tasks requirements and crowdworker's profile; (ii) adopt containers or virtual machines to help crowdworkers set up their environment to perform the task, (iii) plan and control crowdworkers' personal time, and (iv) adopt communication channels to allow crowdworkers to clarify questions about the requirements and, as a consequence, finish the tasks.

Linecept: An Early Prototype of a Timeline-Based Design Coordination Tool

In software design, the various stakeholders generate large numbers of heterogeneous artifacts. These artifacts are often developed in, and managed by, different tools. In this paper, we present our initial prototype of Linecept, a tool that helps stakeholders organize, find, and view disparate design artifacts by organizing them on a timeline that presents a single unified view of the artifacts and who created them. We have used Linecept to retrospectively capture design artifacts for its own creation and in a software design class.

Engineering Human Values in Software through Value Programming

Ignoring human values in software development may disadvantage users by breaching their values and introducing biases in software. This can be mitigated by informing developers about the value implications of their choices and taking initiatives to account for human values in software. To this end, we propose the notion of Value Programming with three principles: (P1) annotating source code and related artifacts with respect to values; (P2) inspecting source code to detect conditions that lead to biases and value breaches in software, i.e., Value Smells; and (P3) making recommendations to mitigate biases and value breaches. To facilitate value programming, we propose a framework that allows for automated annotation of software code with respect to human values. The proposed framework lays a solid foundation for inspecting human values in code and making recommendations to overcome biases and value breaches in software.

More than Code: Contributions in Scrum Software Engineering Teams

Motivated and competent team members are a vital part of Agile Software development and make or break any project's success. Motivation is fostered by continuous progress and recognition of efforts. These concepts are founding pillars of the Scrum methodology, which focuses on self-organizing teams. The types of contributions Scrum development team members make to a project's progress are not only technical. However, a comprehensive model comprising the varied contributions in modern software engineering teams is not yet established. We propose a model that incorporates contributions of all Scrum roles, explicitly including those which are not directly related to project artifacts. It improves the visibility of performed tasks, acts as a starting point for team retrospection, and serves as a foundation for discussion in the research community.

Security but not for security's sake: The impact of social considerations on app developers' choices

We explore a dataset of app developer reasoning to better understand the reasons that may inadvertently promote or demote app developers' prioritization of security. We identify a number of reasons: caring vs. fear of users, the impact of norms, and notions of 'otherness' and 'self' in terms of belonging to groups. Based on our preliminary findings, we propose an interdisciplinary research agenda to explore the impact of social identity (a psychological theory) on developers' security rationales, and how this could be leveraged to guide developers towards making more secure choices.

Requirements Engineering in Implementing IT Support for Scandinavian Healthcare Work Processes Using Outsourced Development in Egypt

We have recently developed a new component for an existing healthcare system for Scandinavian users. The project setup included outsourced development in Egypt. In this experience report, we describe the project and the way we did requirements engineering. We identify and discuss a number of lessons learnt regarding requirements. Some of the lessons relate to the relatively long path from understanding and capturing of the needs of Scandinavian healthcare workers to providing software developers in Egypt the proper basis to do their work efficiently and with high quality. In our case, this path had four main constituents: (1) the Scandinavian healthcare domain; (2) a Scandinavian software company which was our customer; (3) the Danish software company Mjølner; (4) Mjølner's subcontractor Crossworkers in Egypt.

Are Automatic Bug Report Summarizers Missing the Target?

Bug reports can be lengthy due to long descriptions and long conversation threads. Automatic summarization of the text in a bug report can reduce the time spent by software project members on understanding the content of a bug report. Quality of the bug report summaries have been historically evaluated using human-created gold-standard summaries. However, we believe this is not a good practice for two reasons. First, we observed high disagreement levels in the annotated summaries and the number of annotators to create gold-standard summaries was lower than the established value for stable annotation. We believe that creating a fixed summary length of 25% of the word count of the corresponding bug report is not suitable for every time when a person refers to a bug report. Therefore, we propose an automatic sentence annotation method and an interface to customize the presented summary.

Towards Understanding Technical Responses to Requirements Changes in Agile Teams

As a part of an extensive study focusing on how agile teams respond to requirements changes, we carried out a pilot study to understand the technical responses shown by agile practitioners to requirements changes. To the best of our knowledge, how agile teams respond technically to such changes has not yet been studied. We used a qualitative approach using Grounded Theory. Analysis of the interview data collected from ten agile practitioners in New Zealand and Australia resulted in identifying three stages where agile teams respond to requirements changes technically -- while receiving, developing, and delivering changes. We found that even though agile practices do not recommend comprehensive documentation, in practice, the product owner defining a requirements change in detail was stated by the participants as the most common technical response. Developers conducting a technical feasibility study and negotiations among product owner and team when receiving a requirements change were the other most common technical responses. These show a tendancy to deviate from some agile practices in some specific situations.

Software Development at the German Aerospace Center: Role and Status in Practice

Software is an important innovation factor and an integral part of modern research. However, researchers are often faced with challenges in developing software because they do not have the necessary education and skills. The German Aerospace Center (DLR) established its software engineering initiative in 2005 to enable researchers to better meet these challenges. Continuous adaption and improvement of the supportive measures of the initiative require a good understanding of the current role and practice of software development at DLR. Therefore, we conducted a DLR-wide survey on research software development at DLR at the end of 2018.

In this paper, we present the results of this survey and identify possible improvements of the software engineering initiative activities. 773 DLR employees completed our survey and provided information about their academic background, programming experience, and software development practices. The results show that software development is a very relevant topic among the researchers at DLR but also a lack of applying software development best practices. Based on these results we conclude to further enhance the practical focus of our support activities as well as to raise the awareness for these practices to bring them into the daily work of DLR researchers.

Beyond Technical Skills in Software Testing: Automated versus Manual Testing

Software testing is not a purely technical, but rather socio-technical activity. Although there are a few studies on this topic, to the best of our knowledge there is a lack of research focusing specifically on skills, in particular soft skills needed for automated and manual testing. In both cases, software testing is a challenging task that requires considerable effort by practitioners. The aim of this study is to identify what are the most valued skills with regards to these different types of testing. To do so, a survey was applied among software practitioners and 72 responses were received. The questionnaire covers 35 skills grouped in technical (hard) and non-technical (soft) skills. The results of this exploratory study provide empirical evidence that reveals the importance that software practitioners give to hard and soft skills alike.

Why Did your PR Get Rejected?: Defining Guidelines for Avoiding PR Rejection in Open Source Projects

Pull requests are a commonly used method of collaboration for software developers working on open source projects. In this paper, we analyze the most common reasons, sentiment polarity, and interaction length for pull request rejections, as well as the correlations between these factors in a large open-source project called Scapy. We manually analyzed 231 rejected pull requests and systematically mapped sentiment and categorized rejection reasons. We found that the most frequent reasons for pull request rejection refer to source code management issues, incomplete comprehension of project functionalities, poor understanding of what reviewers expect, and misunderstanding the project guidelines (often due to a lack of complete/updated instructions and communication gaps). This work is an ongoing effort toward establishing practical guidelines for globally distributed contributors in open-source projects to minimize pull request rejection and maximize productivity leading to more fruitful remote collaboration. Future work involves expanding the analysis to more projects and incorporating quantitative methods.

An Exploratory Field Study of Programmer Assistance-Seeking during Software Development

Developers often face a dilemma: to seek assistance from a colleague or to expend effort to answer a question herself. On one hand, seeking help is fast and reliable. But on the other, seeking assistance can distract colleagues and reduce their productivity. In this paper, we report our preliminary findings of assistance-seeking from an observational study at a medium-sized software company. We found that developers have varying levels of spoken communication when seeking help. We believe this is correlated with their different years of experience working as developers, among other factors. We also found that many employees would avoid asking for help several times due to various work-related and reported personal reasons. This has driven us to explore a new, exciting research area discovering the complexities of developers seeking help. This paper is our first analysis of this kind, and we hope to receive the community's feedback before continued work.

Behavioral Aspects of Safety-Critical Software Development

We are becoming increasingly dependent on software systems also for highly critical tasks in society. To minimize the risk of failures, regulatory institutions define standards that software organizations must meet. However, the quality of the safety-critical software is, ultimately, determined by the software engineers' behavior.

Even though previous studies have recognized the significance of such behavioral aspects, research that studies them is limited. The aim of this initial study was, therefore, to identify how and in what way, behavioral aspects affect the quality of safety-critical software.

Thematic analysis of interviews with six software engineers identified four themes linking developer behavior to safety. Our analysis suggests that developing safety-critical systems imposes stress on software engineers and that to reduce such pressure it is critical to enhance organizational trust.

It also indicates that the agile way-of-working has the potential to improve safety by facilitating the sharing of domain knowledge. Our findings provide directions for future studies into these important aspects and can be of wider relevance, in particular for the development of secure software, but potentially also for general software engineering.

Immersive IDE: Towards Leveraging Virtual Reality for creating an Immersive Software Development Environment

Positive affects have been shown to be positively correlated with developer productivity. Moreover, the environment in which the developer sits is important as it has a direct bearing on her emotions. However, in typical software development project setups, it is difficult to mold the surroundings of a developer according to her own needs or wishes. Moreover, large project areas may tend to have multiple distractions that may further negatively affect the developer. Multiple studies have shown that Virtual Reality can be an effective medium to induce positive emotions, with the capability to immerse oneself into virtually created environs, allowing for countless opportunities to surround a developer with what she would want. In this paper, we present our approach to allow a developer to choose her own surrounding environment to work in (say a beach, a park or in space!), while allowing for a real-time feed of her workstation/tools to be embedded in the same for a seamless experience. We believe that this approach will enable better mood/engagement and lower distractions/stress, which may lead to higher productivity and a balanced sense of developer well-being.

Towards Sketch-based User Interaction with Integrated Software Development Environments

Powerful software tools, such as software development environments, often have complex graphical user interfaces (GUIs) that are not intuitive to handle, especially when performing complex, multi-step operations. We hypothesize that sketching could be a more intuitive way of expressing user intentions than navigating nested menus or memorizing keyboard shortcuts to accomplish complex operations. Enabling this vision requires software capable of both allowing the user to sketch anywhere on a GUI, and interpreting those sketches as specific commands to be performed within the integrated development environment (IDE). In this paper, we report on preliminary results of an elicitation study performed to gather insights into how developers would use a sketch-based interface.

Educating Project Stakeholders: A Preliminary Report

In college coursework, we take care to educate future professional software engineers on how software development process works. Computer Science and Software Engineering students across the globe study software process models, gather requirements, design, implement and test their software, work on software maintenance, learn to submit bug reports, build project roadmaps, construct UML diagrams, and deploy software.

Yet, ever since the emergence of consumer-facing software, software development often is a collaboration between professional software engineers and multiple stakeholders whose education, professional expertise, and general experience lie outside of computing.

We teach future software engineers how to develop software. Why don't we do the same with other future stakeholders?

This paper is a description of a pilot Software Engineering Without Programming course developed and taught at our university for the first time in 2020. In this early stage report (the course is ongoing as of the submisison deadline, but will have been completed by the time of the workshop) we outline the need for the course, its learning objectives, its organization, and the expected results.

On Developer Relations Team's Reasons for Using Repositories

Organizations such as Amazon, Apple and Google have been investing in Developer Relations (DevRel) team to engage a critical mass of third-party developers in producing and evolving contributions to a common technological platform. It fosters the establishment of a Software Ecosystem (SECO). However, it is still unknown how the DevRel team should act on monitoring a SECO aiming to establish a robust ecosystem. One tangible possibility is to mine repositories to enhance DevRel strategies. In this paper, we report on an investigation of the reasons that lead 31 DevRel practitioners to use software repositories during their activities. The results point out a common perspective for DevRel practitioners and researchers in developing strategies and research roadmaps.

Mining for Process Improvements: Analyzing Software Repositories in Agile Retrospectives

Software Repositories contain knowledge on how software engineering teams work, communicate, and collaborate. It can be used to develop a data-informed view of a team's development process, which in turn can be employed for process improvement initiatives. In modern, Agile development methods, process improvement takes place in Retrospective meetings, in which the last development iteration is discussed. However, previously proposed activities that take place in these meetings often do not rely on project data, instead depending solely on the perceptions of team members. We propose new Retrospective activities, based on mining the software repositories of individual teams, to complement existing approaches with more objective, data-informed process views.

Teaching-learning of software conceptual design via function-behaviour-structure framework

Conceptual design is one of the initial phases in software design. In this phase the functional requirements are extracted from the problem and transformed into descriptions of solution concepts. Peculiar characteristics of software conceptual design (scd), such as dynamicity and intangibility add to the complexity of this phase. Modeling using Unified Modeling Language (UML) tools and creating various UML representations often characterize industry practices of scd. Modeling and representations are compartmentalized in UML, i.e. the representations correspond to the solution view of function, behaviour or structure. Novices learn about syntax, semantics and processes to create UML representations in their undergraduate courses such as Software Engineering. However, results from our novice studies indicate that they are unable to create scd due to difficulties such as fixation, lack of integration. These difficulties lead to creation of solutions that are neither integrated nor fulfill all functional requirements. Current teaching-learning methods do not explicitly support novices to overcome these difficulties. In this paper we describe the design of a teaching-learning environment, 'think & link'. It is based on the theoretical design framework of Function-behaviour-structure (FBS). Initial studies with learners using 'think & link' indicate conceptual change in novices understanding of scd.

A preliminary study about the low engagement of female participation in hackathons

Hackathons, which are collaborative time-bounded events that are a sort of application development marathon, lasting between 24 and 48 hours, are increasing as an alternative and popular method for fast learning and networking, bringing people together in a short space of time to do creative projects. However, the number of women participating in such events is extremely low and worrying, and literature lacks empirical evidence about that. This preliminary study aims to gather more data in an attempt to understand better why women are not so interested in this type of event.

What Malaysian Software Students Think about Testing?

Software testing is one of the crucial supporting processes of software life cycle. Unfortunately for the software industry, the role is stigmatized, partly due to misperception and partly due to treatment of the role in the software industry. The present study aims to analyse this situation to explore what inhibit an individual from taking up a software testing career. In order to investigate this issue, we surveyed 82 senior students taking a degree in information technology, information and communication technology, and computer science at two Malaysian universities. The subjects were asked the PROs and CONs of taking up a career in software testing and what were the chances that they would do so. The study identified 7 main PROs and 9 main CONSs for starting a testing career, and indicated that the role of software tester has been perceived as a social role, with more soft skills connotations than technical implications. The results also show that Malaysian students have a more positive attitude towards software testing than their counterparts where similar investigations have been carried out.

WORKSHOP SESSION: 2nd Workshop on Testing for Deep Learning and Deep Learning for Testing (DeepTest)

Evaluating Surprise Adequacy for Question Answering

With the wide and rapid adoption of Deep Neural Networks (DNNs) in various domains, an urgent need to validate their behaviour has risen, resulting in various test adequacy metrics for DNNs. One of the metrics, Surprise Adequacy (SA), aims to measure how surprising a new input is based on the similarity to the data used for training. While SA has been evaluated to be effective for image classifiers based on Convolutional Neural Networks (CNNs), it has not been studied for the Natural Language Processing (NLP) domain. This paper applies SA to NLP, in particular to the question answering task: the aim is to investigate whether SA correlates well with the correctness of answers. An empirical evaluation using the widely used Stanford Question Answering Dataset (SQuAD) shows that SA can work well as a test adequacy metric for the question answering task.

OffSide: Learning to Identify Mistakes in Boundary Conditions

Mistakes in boundary conditions are the cause of many bugs in software. These mistakes happen when, e.g., developers make use of '<' or '>' in cases where they should have used '<=' or '>='. Mistakes in boundary conditions are often hard to find and manually detecting them might be very time-consuming for developers. While researchers have been proposing techniques to cope with mistakes in the boundaries for a long time, the automated detection of such bugs still remains a challenge. We conjecture that, for a tool to be able to precisely identify mistakes in boundary conditions, it should be able to capture the overall context of the source code under analysis. In this work, we propose a deep learning model that learn mistakes in boundary conditions and, later, is able to identify them in unseen code snippets. We train and test a model on over 1.5 million code snippets, with and without mistakes in different boundary conditions. Our model shows an accuracy from 55% up to 87%. The model is also able to detect 24 out of 41 real-world bugs; however, with a high false positive rate. The existing state-of-the-practice linter tools are not able to detect any of the bugs. We hope this paper can pave the road towards deep learning models that will be able to support developers in detecting mistakes in boundary conditions.

Deep Learning for Software Defect Prediction: A Survey

Software fault prediction is an important and beneficial practice for improving software quality and reliability. The ability to predict which components in a large software system are most likely to contain the largest numbers of faults in the next release helps to better manage projects, including early estimation of possible release delays, and affordably guide corrective actions to improve the quality of the software. However, developing robust fault prediction models is a challenging task and many techniques have been proposed in the literature. Traditional software fault prediction studies mainly focus on manually designing features (e.g. complexity metrics), which are input into machine learning classifiers to identify defective code. However, these features often fail to capture the semantic and structural information of programs. Such information is needed for building accurate fault prediction models. In this survey, we discuss various approaches in fault prediction, also explaining how in recent studies deep learning algorithms for fault prediction help to bridge the gap between programs' semantics and fault prediction features and make accurate predictions.

Does Neuron Coverage Matter for Deep Reinforcement Learning?: A Preliminary Study

Deep Learning (DL) is powerful family of algorithms used for a wide variety of problems and systems, including safety critical systems. As a consequence, analyzing, understanding, and testing DL models is attracting more practitioners and researchers with the purpose of implementing DL systems that are robust, reliable, efficient, and accurate. First software testing approaches for DL systems have focused on black-box testing, white-box testing, and test cases generation, in particular for deep neural networks (CNNs and RNNs). However, Deep Reinforcement Learning (DRL), which is a branch of DL extending reinforcement learning, is still out of the scope of research providing testing techniques for DL systems. In this paper, we present a first step towards testing of DRL systems. In particular, we investigate whether neuron coverage (a widely used metric for white-box testing of DNNs) could be used also for DRL systems, by analyzing coverage evolutionary patterns, and the correlation with RL rewards.

Manifold-based Test Generation for Image Classifiers

WORKSHOP SESSION: 1st International Workshop on Engineering and Cybersecurity of Critical Systems (EnCyCriS)

Simulation Games Platform for Unintentional Perpetrator Attack Vector Identification

Cyber-security protection of critical systems is one of the major challenges of today. Although the attacks typically originate from attackers with malicious intent, a substantial portion of attack vectors is enabled by unintentional perpetrators, i.e., insiders who cause an incident by negligence, carelessness, or lack of training. Prevention of these situations is challenging because insiders have better access to the organization's resources and hence, are more likely to cause harm. Moreover, the insider-mediated actions of an attack vector often come unrecognized by security admins as well as the insiders themselves.

In this paper, we focus on the identification of the attack vector of unintentional perpetrators. To this end, we propose to employ specialized games that simulate the working period, while the player faces multiple dangers that might cause harm in their company. From the analysis of their actions, we discover the attack vector, which could be addressed before an actual attack happens. To reflect a variety of insiders and company environments, we introduce a platform for designing variants of these games, together with its architecture, an example of a simple game that can be created using the platform, and the used analysis method.

Identifying Critical Components in Large Scale Cyber Physical Systems

The problem of identifying critical components in large scale networked Cyber-Physical Systems comes up as an underlying issue when attempting to enhance the efficiency, the safety and the security of such systems. Graph theory is one of the well-studied methods that are often used to model complex systems and to facilitate the analysis of network-based features of systems to identify critical components. However, recent studies mainly focus on identifying influential nodes in a system and neglect the importance of links. In this paper, we heed to the identification of both key links and nodes in a system, and we aggregate the result by leveraging the multi-variable synthetic evaluation and multiple-criteria decision-making M-TOPSIS method to rank the system components based on their importance.

Domain-Based Fuzzing for Supervised Learning of Anomaly Detection in Cyber-Physical Systems

A novel approach is proposed for constructing models of anomaly detectors using supervised learning from the traces of normal and abnormal operations of an Industrial Control System (ICS). Such detectors are of value in detecting process anomalies in complex critical infrastructure such as power generation and water treatment systems. The traces are obtained by systematically "fuzzing", i.e., manipulating the sensor readings and actuator actions in accordance with the boundaries/partitions that define the system's state. The proposed approach is tested in a Secure Water Treatment (SWaT) testbed -- a replica of a real-world water purification plant, located at the Singapore University of Technology and Design. Multiple supervised classifiers are trained using the traces obtained from SWaT. The efficacy of the proposed approach is demonstrated through empirical evaluation of the supervised classifiers under various performance metrics. Lastly, it is shown that the supervised approach results in significantly lower false positive rates as compared to the unsupervised ones.

Specific Air Traffic Management Cybersecurity Challenges: Architecture and Supply Chain

Cybsersecurity is without doubt becoming a societal challenge. It even starts to affect sectors that were not considered to be at risk in the past because of their relative isolation. One of these sectors is aviation in general, and specifically air traffic management (ATM). New developments in technology and the---ever increasing---trend to interconnect systems have drastically changed this landscape. Like many safety relevant sectors, the general attitude in aviation is rather conservative. Safety practitioners prefer slow changes, which conflicts with the "rapid response" requirements coming from the (cyber)security area.

Air traffic management systems are large socio-technical systems, a fact that adds an additional dimension to the cybersecurity complexity. In this paper I will address a subset of sector specific challenges that we have to address in the ATM domain. The number of challenges is quite substantial and topics like security policies, risk assessment methodologies and vulnerabilities in specific areas (e.g. ADS-B) can be found in the literature. This paper will look at two challenges that have received, so far, less attention: architecture and supply chain.

Towards an Automated Approach for Detecting Architectural Weaknesses in Critical Systems

Architecture-first approaches are increasingly widely adopted for addressing resiliency requirements in critical systems. In these approaches, the system is built from the ground-up to be resilient, starting with the system's architecture design. Therefore, it is crucial to ensure that the architecture design is robust, without any flaws that could compromise the system's ability to detect, prevent, react to or recover from adverse conditions, such as cyber-attacks. In this paper, we describe our ongoing efforts in aiding software architects in designing cyber-resilient systems by automatically detecting weaknesses in their architectural models.

Security Threat Modeling: Are Data Flow Diagrams Enough?

Traditional threat modeling approaches such as Microsoft's STRIDE rely on Data Flow Diagrams (DFDs) as the main input. As DFDs are constructed from only five distinct model element types, these system models are deliberately kept simple. While this lowers the bar for practical adoption, there are a number of significant drawbacks.

In this position paper, we identify and illustrate four key shortcomings of DFD models when used for security threat modeling, related to the inadequate representation of security concepts, data elements, abstraction levels, and deployment information. Based on these shortcomings, we posit the need for a dedicated, integrated language for threat modeling, and discuss the trade-offs that need to be made between the ease of adoption and the level of support for systematic and repeatable threat modeling.

Towards automated safety analysis for architectures of dynamically forming networks of cyber-physical systems

Dynamically forming networks of cyber-physical systems are becoming increasingly widespread in manufacturing, transportation, automotive, avionics and more domains. The emergence of future internet technology and the ambition for ever closer integration of different systems leads to highly collaborative cyber-physical systems. Such cyber-physical systems form networks to provide additional functions, behavior, and benefits the individual systems cannot provide on their own. As safety is a major concern of systems from these domains, there is a need to provide adequate support for safety analyses of these collaborative cyber-physical systems. This support must explicitly consider the dynamically formed networks of cyber-physical systems. This is a challenging task as the configurations of these cyber-physical system networks (i.e. the architecture of the super system the individual system joins) can differ enormously depending on the actual systems joining a cyber-physical system network. Furthermore, the configuration of the network heavily impacts the adaptations performed by the individual systems and thereby impacting the architecture not only of the system network but of all individual systems involved. As existing safety analysis techniques, however, are not meant for supporting such an array of potential system network configurations the individual system will have to be able to cope with at runtime, we propose automated support for safety analysis for these systems that considers the configuration of the system network. Initial evaluation results from the application to industrial case examples show that the proposed support can aid in the detection of safety defects.

Security as Culture: A Systematic Literature Review of DevSecOps

DevOps goes beyond automation, continuous integration and delivery processes, since it also encompasses people. In fact, DevOps promotes the collaboration between the development team and the operations team. When security comes into DevOps routines, people play an even more relevant role involving the collaboration between those teams and security team. Moreover, security is especially relevant while developing critical systems where we need to manage goals, risks and evidences. After implementing security into the DevOps toolchain, work only starts. We also need to start with behavioral changes in order to create a security culture. Several authors underlined DevSecOps, as one of the proposals for solving or, at least, minimizing this challenge. However, to date, the characterization of such a culture remains unclear. In this paper, a Systematic Literature Review was carried out to provide a better understanding of this topic from the human factor's perspective. However it raises the following question: Is DevSecOps going to become mainstream?

What happens in a control room during a cybersecurity attack?: Preliminary observations from a pilot study

Cyberattacks on the critical infrastructure is a growing concern for businesses, national authorities and public in general. The increasing complexity and connectivity of the critical infrastructure systems have made them susceptible to cyberattacks. The traditional notion of safety systems being isolated is no longer applicable, as we have seen ample examples on how these systems can be exploited through gaps in e.g. supply chain, physical security, insiders. This places greater importance on how the staff belonging to owners and operators of these critical infrastructure, e.g. operators, IT/security personnel, system engineers, management, are prepared to handle cyberattacks. This paper presents our ongoing research on investigating the preparedness of organisations to handle cybersecurity incidents and providing holistic solutions to improve cybersecurity posture. We present one experiment that has been conducted using our cybersecurity centre and man-machine laboratory to study how operators and security team of a power plant will handle a cyberattack. We highlight the main observations made through this experiment.

WORKSHOP SESSION: 8th International Workshop on Genetic Improvement (GI)

WES: Agent-based User Interaction Simulation on Real Infrastructure

We introduce the Web-Enabled Simulation (WES) research agenda, and describe FACEBOOK's WW system. We describe the application of WW to reliability, integrity and privacy at FACEBOOK1, where it is used to simulate social media interactions on an infrastructure consisting of hundreds of millions of lines of code. The WES agenda draws on research from many areas of study, including Search Based Software Engineering, Machine Learning, Programming Languages, Multi Agent Systems, Graph Theory, Game AI, and AI Assisted Game Play. We conclude with a set of open problems and research challenges to motivate wider investigation.

Human Factors in the Study of Automatic Software Repair: Future Directions for Research with Industry

Automatic software repair represents a significant development in software engineering, promising considerable potential change to the working procedures and practices of software engineers. Technical advances have been the focus of many recent publications. However, there has not been an equivalent growth of studies of human factors within automatic software repair. This position paper presents the case for increased research in this area and suggests three key focuses and approaches for a future research agenda. All three of these enable industry-based software engineers not just to provide feedback on automatic software repair tools but to participate in shaping these technologies so that they meet developer and industry needs.

Synthetic Benchmarks for Genetic Improvement

Genetic improvement (GI) uses automated search to find improved versions of existing software. If over the years the potential of many GI approaches have been demonstrated, the intrinsic cost of evaluating real-world software makes comparing these approaches in large-scale meta-analyses very expensive. We propose and describe a method to construct synthetic GI benchmarks, to circumvent this bottleneck and enable much faster quality assessment of GI approaches.

Stack-Based Genetic Improvement

Genetic improvement (GI) uses automated search to find improved versions of existing software. If originally GI directly evolved populations of software, most GI work nowadays use a solution representation based on a list of mutations. This representation however has some limitations, notably in how genetic material can be re-combined. We introduce a novel stack-based representation and discuss its possible benefits.

Checkers: Multi-modal Darwinian API Optimisation

Advent of microservices has increased the popularity of the API-first design principles. Developers have been focusing on concretising the API to a system before building the system. An API-first approach assumes that the API will be correctly used. Inevitably, most developers, even experienced ones, end-up writing sub-optimal software because of using APIs incorrectly. In this paper, we discuss an automated approach for exploring API equivalence and a framework to synthesise semantically equivalent programs. Unlike existing approaches to API transplantation, we propose an amorphous or formless approach to software translation in which a single API could potentially be replaced by a synthesised sequence of APIs which ensures type progress. Our search is guided by the non-functional goals for the software, a type-theoretic notion of progress, the application's test suite and an automatic multi-modal embedding of the API from its documentation and code analysis.

Towards Knowledge-guided Genetic Improvement

We propose Knowledge-guided Genetic Improvement as a combination of Grammar-guided Genetic Programming with Tree-based Genetic Programming. Instead of utilizing a grammar directly, an operator graph based on that grammar is created, that is responsible for producing abstract syntax trees. Each operator contains knowledge about the grammar symbol it represents and returns only trees valid according to user-defined restrictions such as depth, complexity and approximated run-time performance.

The expected benefits are a search space that excludes invalid individuals in an evolutionary run, ensuing a reduced overhead to evaluate invalid solutions and improving overall quality of the explored search space. The operator graph supports improvements based on previously run experiments and extensions towards further non-functional features.

WORKSHOP SESSION: 1st International Workshop on Governance in Software Engineering (GISE)

Extending Software Development Governance to meet IT Governance

Given the importance of IT for organizations worldwide, IT Governance is an increasing concern for C-suite officers. Inside IT, software is a key aspect in the governance scenario. The increasing pressures by regulatory, and compliance efforts are changing the software governance arena, thus there is a need to focus on the current state of the topic. Despite this acknowledged need, the studies on Software Governance are still scarce. In this paper, authors expand the software governance model introduced by Chulani et al. [1] with new concerns derived from the alignment of the model with the IT Governance standard, ISO/ IEC 38500 and the experience of authors. Moreover, the new model proposes the categorization of these concerns to govern software development activities aligned with IT Governance.

Illuminating a Blind Spot in Digitalization - Software Development in Sweden's Private and Public Sector

As Netscape co-founder Marc Andreessen famously remarked in 2011, software is eating the world -- becoming a pervasive invisible critical infrastructure. Data on the distribution of software use and development in society is scarce, but we compile results from two novel surveys to provide a fuller picture of the role software plays in the public and private sectors in Sweden, respectively. Three out of ten Swedish firms, across industry sectors, develop software in-house. The corresponding figure for Sweden's government agencies is four out of ten, i.e., the public sector should not be underestimated. The digitalization of society will continue, thus the demand for software developers will further increase. Many private firms report that the limited supply of software developers in Sweden is directly affecting their expansion plans. Based on our findings, we outline directions that need additional research to allow evidence-informed policy-making. We argue that such work should ideally be conducted by academic researchers and national statistics agencies in collaboration.

Data Sovereignty Governance Framework

Data has emerged as a central commodity in most modern applications. Unregulated and rampant collection of user and usage data by applications led to concerns on privacy, trust, and ethics. This has resulted in several governments and organizations across geographies to frame laws on data (e.g., the European Union's General Data Protection Regulation (GDPR)) that govern and define boundaries for the storage, processing and transitioning of data; and thereby safeguard the interests of its citizens. Data Sovereignty and Data Localization are two important aspects, which deal with the adherence to the laws and governance structures, that define where and how data is collected and processed. The applicability of different laws depends upon several attributes such as the nature, type, and purpose of data. Non-compliance to laws/regulations can lead to serious repercussions for enterprises, ranging from hefty penalties to loss of brand value. Ensuring that all of their applications are complaint to various laws and regulations is non-trivial. Enterprises have to deal with a plethora of laws (that are constantly evolving) and are often confused even in correctly identifying all the applicable laws for their context leave alone ensuring compliance to regulations. Therefore, in this paper, we propose a knowledge graph based data sovereignty governance framework that assists in classifying data and in identifying the relevant applicable laws.

Society-Level Software Governance: A Challenging Scenario

The technology-driven transformation process continues to spawn novel, growth-oriented digital application domains and platforms. The user base of these society-level software systems consists of a larger proportion of the community and that involve a large set of stakeholder groups. In case of an incident there is a public demand from a variety of stakeholders for multilateral intervention in order to correct the behavior of the software system. For software engineering as a technical discipline that has been fostered and matured in corporate and organizational context, this is a major challenge because it has to deal with a multitude of multidisciplinary stakeholders and their concerns. In order to stimulate further discussions, we discuss software governance on societal level and identify future research challenges of this increasingly relevant topic.

Software Engineering in a Governed World: Opportunities and Challenges

Modern software applications are becoming ubiquitous and pervasive affecting various aspects of our lives and livelihoods. At the same time, the risks to which these systems expose the organizations and end users are also growing dramatically. Governments and regulatory bodies are moving towards developing laws, regulations, and guidelines for several software applications (e.g., those that use data, are based on AI/ML etc.) across different domains. These mandates impose several challenges in the way how software is built and delivered, primary amongst them is to ensure that software and its delivery processes are compliant. There is a need for governance frameworks that enable the recording, monitoring, and analysis of various activities throughout the application development life cycle making the development processes transparent, traceable, verifiable, auditable, and adhering to regulations and best practices, thereby enabling trustworthiness of software. In this paper, we discuss about the challenges and opportunities of software engineering in the governance era.

WORKSHOP SESSION: 4th International Workshop on Refactoring (IWoR)

Refactoring of Neural Network Models for Hyperparameter Optimization in Serverless Cloud

Machine Learning and Neural Networks in particular have become hot topics in Computer Science. The recent 2019 Turing award to the forefathers of Deep Learning and AI - Yoshua Bengio, Geoffrey Hinton, and Yann LeCun proves the importance of the technology and its effect on science and industry. However, we have realized that even nowadays, the state of the art methods require several manual steps for neural network hyperparameter optimization. Our approach automates the model tuning by refactoring the original Python code using open-source libraries for processing. We were able to identify hyperparameters by parsing the original source and analyzing it. Given these parameters, we refactor the model, add the state of the art optimization library calls, and run the updated code in the Serverless Cloud. Our approach has proven to eliminate manual steps for an arbitrary TensorFlow and Keras tuning. We have created a tool called OptPar which automatically refactors an arbitrary Deep Neural Network optimizing its hyperparameters. Such a transformation can save hours of time for Data Scientists, giving them an opportunity to concentrate on designing their Machine Learning algorithms.

Recommendation of Move Method Refactoring Using Path-Based Representation of Code

Software refactoring plays an important role in increasing code quality. One of the most popular refactoring types is the Move Method refactoring. It is usually applied when a method depends more on members of other classes than on its own original class. Several approaches have been proposed to recommend Move Method refactoring automatically. Most of them are based on heuristics and have certain limitations (e.g., they depend on the selection of metrics and manually-defined thresholds). In this paper, we propose an approach to recommend Move Method refactoring based on a path-based representation of code called code2vec that is able to capture the syntactic structure and semantic information of a code fragment. We use this code representation to train a machine learning classifier suggesting to move methods to more appropriate classes. We evaluate the approach on two publicly available datasets: a manually compiled dataset of well-known open-source projects and a synthetic dataset with automatically injected code smell instances. The results show that our approach is capable of recommending accurate refactoring opportunities and outperforms JDeodorant and JMove, which are state of the art tools in this field.

Inheritance versus Delegation: which is more energy efficient?

Energy consumption of software is receiving more attention as concerns regarding climate change increase. One factor that significantly impacts how much energy is expended by a software application is the design of the software itself. Existing studies find few consistent results regarding the impact of common refactorings on energy consumption, nor do they define a concrete set of metrics that measure the energy efficiency of software. In this paper, we present the results of preliminary experiments that explore the Replace Inheritance with Delegation refactoring, and its inverse, to assess the impact these design-level refactorings have on energy consumption in the Java programming language. In the tested programs, inheritance proved to be more energy efficient than delegation, with a reduction in run time of 77% and a reduction in average power consumption of 4%. We subsequently propose a research plan to further explore this problem and observe a number of specific challenges in this area. The primary goals of this research are threefold: (i) to investigate how redundancy in an object-oriented design can contribute to unnecessary energy consumption, (ii) to determine how refactoring of the software can remove this redundancy, and (iii) to develop a general-purpose automated tool to perform this refactoring.

Predictable, Flexible or Correct: Trading off Refactoring Design Choices

Refactoring tools automate tedious and error-prone source code changes. Such tools can improve the speed and accuracy of software development, yet developers frequently eschew automation in favor of manual refactoring. Developers report distrust and lack of predictability as reasons for not using automated tools, but there are no comprehensive explanations of trust and predictability nor guidelines for how to improve these aspects of tools. In this position paper we explore choices and tradeoffs in refactoring tool design.

Increasing the Trust In Refactoring Through Visualization

In software development, maintaining good design is essential. The process of refactoring enables developers to improve this design during development without altering the program's existing behavior. However, this process can be time-consuming, introduce semantic errors, and be difficult for developers inexperienced with refactoring or unfamiliar with a given code base. Automated refactoring tools can help not only by applying these changes, but by identifying opportunities for refactoring. Yet, developers have not been quick to adopt these tools due to a lack of trust between the developer and the tool. We propose an approach in the form of a visualization to aid developers in understanding these suggested operations and increasing familiarity with automated refactoring tools. We also provide a manual validation of this approach and identify options to continue experimentation.

On the Relationship Between Developer Experience and Refactoring: An Exploratory Study and Preliminary Results

Refactoring is one of the means of managing technical debt and maintaining a healthy software structure through enforcing best design practices, or coping with design defects. Previous refactoring surveys have shown that these code restructurings are mainly executed by developers who have sufficient knowledge of the system's design, and disposing of leadership roles in their development teams. However, these surveys were mainly limited to specific projects and companies. In this paper, we explore the generalizability of the previous results though analyzing 800 open-source projects. We mine their refactoring activities, and we identify their corresponding contributors. Then, we associate an expertise score to each contributor in order to test the hypothesis of whether developers with higher scores tend to perform a higher number of refactoring operations. We found that (1) although refactoring is not restricted to a subset of developers, those with higher experiences score tend to perform more refactorings than others; (2) our qualitative analysis of three randomly sampled projects show that the developers who are responsible for the majority of refactoring activities are typically on advanced positions in their development teams, demonstrating their extensive knowledge of the design of the systems they contribute to.

An Exploratory Study on the Refactoring of Unit Test Files in Android Applications

An essential activity of software maintenance is the refactoring of source code. Refactoring operations enable developers to take necessary actions to correct bad programming practices (i.e., smells) in the source code of both production and test files. With unit testing being a vital and fundamental part of ensuring the quality of a system, developers must address smelly test code. In this paper, we empirically explore the impact and relationship between refactoring operations and test smells in 250 open-source Android applications (apps). Our experiments showed that the type of refactoring operations performed by developers on test files differ from those performed on non-test files. Further, results around test smells show a co-occurrence between certain smell types and refactorings, and how refactorings are utilized to eliminate smells. Findings from this study will not only further our knowledge of refactoring operations on test files, but will also help developers in understanding the possible ways on how to maintain their apps.

WORKSHOP SESSION: 1st International Workshop on Knowledge Graph for Software Engineering (KG4SE)

Modelling Knowledge about Software Processes using Provenance Graphs and its Application to Git-based Version Control Systems

Using the W3C PROV data model, we present a general provenance model for software development processes and---as an example---specialized models for git services, for which we generate provenance graphs. Provenance graphs are knowledge graphs, since they have defined semantics, and can be analyzed with graph algorithms or semantic reasoning to get insights into processes.

Mining Hypernyms Semantic Relations from Stack Overflow

Communication between a software development team and business partners is often a challenging task due to the different context of terms used in the information exchange. The various contexts in which the concepts are defined or used create slightly different semantic fields that can evolve into information and communication silos. Due to the silo effect, the necessary information is often inadequately forwarded to developers resulting in poorly specified software requirements or misinterpreted user feedback. Communication difficulties can be reduced by introducing a mapping between the semantic fields of the parties involved in the communication based on the commonly used terminologies. Our research aims to obtain a suitable semantic database in the form of a semantic network built from the Stack Overflow corpus, which can be considered to encompass the common tacit knowledge of the software development community. Terminologies used in the business world can be assigned to our semantic network, so software developers do not miss features that are not specific to their world but relevant to their clients. We present an initial experiment of mining semantic network from Stack Overflow and provide insights of the newly captured relations compared to WordNet.

DockerKG: A Knowledge Graph of Docker Artifacts

Docker helps developers reuse software artifacts by providing a lightweight solution to the problem of operating system virtualization. A Docker image contains very rich and useful knowledge of software engineering, including the source of software packages, the correlations among software packages, the installation methods of software packages and the information on operating systems. To effectively obtain this knowledge, this paper proposes an approach to constructing a knowledge graph of Docker artifacts, named DockerKG, by analyzing a large number of Dockerfiles in Docker Hub, which contains more than 3.08 million Docker repositories (up to February 2020). Currently, DockerKG contains the domain knowledge extracted from approximately 200 thousand Dockerfiles in Docker Hub. Besides, it contains the information on Docker repositories and their semantic tags. In future work, DockerKG can be used for Docker image recommendations and online Q&A service providing software engineering domain knowledge.

Knowledge Extraction from Natural Language Requirements into a Semantic Relation Graph

Knowledge extraction and representation aims to identify information and to transform it into a machine-readable format. Knowledge representations support Information Retrieval tasks such as searching for single statements, documents, or metadata. Requirements specifications of complex systems such as automotive software systems are usually divided into different subsystem specifications. Nevertheless, there are semantic relations between individual documents of the separated subsystems, which have to be considered in further processes (e.g. dependencies). If requirements engineers or other developers are not aware of these relations, this can lead to inconsistencies or malfunctions of the overall system. Therefore, there is a strong need for tool support in order to detects semantic relations in a set of large natural language requirements specifications. In this work we present a knowledge extraction approach based on an explicit knowledge representation of the content of natural language requirements as a semantic relation graph. Our approach is fully automated and includes an NLP pipeline to transform unrestricted natural language requirements into a graph. We split the natural language into different parts and relate them to each other based on their semantic relation. In addition to semantic relations, other relationships can also be included in the graph. We envision to use a semantic search algorithm like spreading activation to allow users to search different semantic relations in the graph.

WORKSHOP SESSION: 5th International Workshop on Metamorphic Testing (MET)

Metamorphic Fuzz Testing of Autonomous Vehicles

Driving simulation is the primary approach for testing the software components of autonomous vehicles. This paper presents an automated testing method, termed metamorphic fuzz testing (MFT), in the context of simulation testing of autonomous driving. MFT differs from existing fuzzing techniques in the following two stages: First, it can generate "unrealistic" scenarios where scenes of the virtual world are refreshed frequently (so obstacles can suddenly appear / disappear)---this is to test the self-driving vehicle's robustness in the face of unexpected situations. In the second stage, MFT uses metamorphic relations as a filtering or debugging tool to distinguish between genuine failures and false alarms yielded in the first stage. We conduct empirical studies using the real-life Baidu Apollo self-driving system, recording a genuine failure rate of 3.7%. We have reported some of the detected failures to the Apollo team and received their confirmation. Our testing method is platform-independent and, therefore, can be applied to other autonomous driving systems and advanced driver-assistance systems (ADAS).

A Testing Tool for Machine Learning Applications

We present the design of MTKeras, a generic metamorphic testing framework for machine learning, and demonstrate its effectiveness through case studies in image classification and sentiment analysis.

Metamorphic Robustness Testing of Google Translate

Current research on the testing of machine translation software mainly focuses on functional correctness for valid, well-formed inputs. By contrast, robustness testing, which involves the ability of the software to handle erroneous or unanticipated inputs, is often overlooked. In this paper, we propose to address this important shortcoming. Using the metamorphic robustness testing approach, we compare the translations of original inputs with those of follow-up inputs having different categories of minor typos. Our empirical results reveal a lack of robustness in Google Translate, thereby opening a new research direction for the quality assurance of neural machine translators.

ARCAMETES: A Learning Approach for Metamorphic Exploration and Testing

In its simplest form, software testing consists of creating test cases from a defined input space, running them in the system-under-test (SUT), and evaluating the outputs with a mechanism for determining success or failure (i.e. an oracle). Metamorphic testing (MT) provides powerful concepts for alleviating the problem of a lack of oracles. To increase the adoption of MT among industry practitioners, approaches and tools that lower the effort to identify potential metamorphic relations (MRs) are very much in demand. As such, we propose a learning-based approach to MR discovery and exploration using concepts of metamorphic testing, association rule learning, and combinatorial testing. The results have implications for numerous applications including software testing and program comprehension, among others. These implications set a strong foundation for a future, extensible metamorphic exploration framework.

M.R. Hunter: Hunting for Metamorphic Relations by Puzzle Solving

Metamorphic testing (MT) is getting increasingly popular by exhibiting test effectiveness for a wide range of subjects, from compilers to machine learning programs. However, the central part of MT, i.e., the derivation of useful metamorphic relations (MRs), still falls behind MT's rapid applications. In this paper, we propose M.R. Hunter, an interactive online game for attracting users, especially those who are not familiar with, or even reluctant to, learning intrinsic complexities behind MT, to participate into the MR derivation process in a puzzle-solving way. The game design carefully considers how to guide users to participate actively, how to present conjectured MRs intuitively, and how to validate MRs effectively. So far, we have built and deployed a preliminary version of the game, and received active feedbacks, suggesting both promising results and useful advices for future improvement.

Metamorphic filtering of black-box adversarial attacks on multi-network face recognition models

Adversarial examples pose a serious threat to the robustness of machine learning models in general and of deep learning models in particular. These carefully designed perturbations of input images can cause targeted misclassifications to a label of the attacker's choice, without being detectable to the naked eye. A particular class of adversarial attacks called black box attacks can be used to fool a target model despite not having access to the model parameters or to the input data used to train the model. In this paper, we first build a black box attack against robust multi-model face recognition pipelines and then test it against Google's FaceNet. We then present a novel metamorphic defense pipeline relying on nonlinear image transformations to detect adversarial attacks with a high degree of accuracy. We further use the results to create probabilistic metamorphic relations that define efficient decision boundaries between the safe and adversarial examples; achieving adversarial classification accuracy of up to 96%.

Improving The Effectiveness of Automatically Generated Test Suites Using Metamorphic Testing

Automated test generation has helped to reduce the cost of software testing. However, developing effective test oracles for these automatically generated test inputs is a challenging task. Therefore, most automated test generation tools use trivial oracles that reduce the fault detection effectiveness of these automatically generated test cases. In this work, we provide results of an empirical study showing that utilizing metamorphic relations can increase the fault detection effectiveness of automatically generated test cases.

MRpredT: Using Text Mining for Metamorphic Relation Prediction

Metamorphic relations (MRs) are an essential component of metamorphic testing (MT) that highly affects its fault detection effectiveness. MRs are usually identified with the help of a domain expert, which is a labor-intensive task. In this work, we explore the feasibility of a text classification-based machine learning approach to predict MRs using their program documentation as the sole input. We compare our method to our previously developed graph kernelbased machine learning approach and demonstrate that textual features extracted from program documentation are highly effective for predicting metamorphic relations for matrix calculation programs.

Automatic Improvement of Machine Translation Using Mutamorphic Relation: Invited Talk Paper

This paper introduces Mutamorphic Relation for Machine Learning Testing. Mutamorphic Relation combines data mutation and metamorphic relations as test oracles for machine learning systems. These oracles can help achieve fully automatic testing as well as automatic repair of the machine learning models.

The paper takes TransRepair as an example to show the effectiveness of Mutamorphic Relation in automatically testing and improving machine translators, TransRepair detects inconsistency bugs without access to human oracles. It then adopts probability-reference or cross-reference to post-process the translations, in a grey-box or black-box manner, to repair the inconsistencies. Manual inspection indicates that the translations repaired by TransRepair improve consistency in 87% of cases (degrading it in 2%), and that the repairs of have better translation acceptability in 27% of the cases (worse in 8%).

WORKSHOP SESSION: 1st International Workshop on Quantum Software Engineering (QSE)

Software engineering for 'quantum advantage'

Software is a critical factor in the reliability of computer systems. While the development of hardware is assisted by mature science and engineering disciplines, software science is still in its infancy. This situation is likely to worsen in the future with quantum computer systems. Actually, if quantum computing is quickly coming of age, with potential groundbreaking impacts on many different fields, such benefits come at a price: quantum programming is hard and finding new quantum algorithms is far from straightforward. Thus, the need for suitable formal techniques in quantum software development is even bigger than in classical computation. A lack of reliable approaches to quantum computer programming will put at risk the expected quantum advantage of the new hardware. This position paper argues for the need for a proper quantum software engineering discipline benefiting from precise foundations and calculi, capable of supporting algorithm development and analysis.

Property-based Testing of Quantum Programs in Q#

Property-based testing is a structured method for automated testing using program specifications. We report on the design and implementation of what is to our knowledge the first property-based framework for quantum programs. We review various aspects of our design concerning property-specification, test-case generation, and test result analysis. We also provide an overview of the implementation and its way of working. Finally, we present the result of applying our framework to some examples.

Insights on Training Neural Networks for QUBO Tasks

Current hardware limitations restrict the potential when solving quadratic unconstrained binary optimization (QUBO) problems via the quantum approximate optimization algorithm (QAOA) or quantum annealing (QA). Thus, we consider training neural networks in this context. We first discuss QUBO problems that originate from translated instances of the traveling salesman problem (TSP): Analyzing this representation via autoencoders shows that there is way more information included than necessary to solve the original TSP. Then we show that neural networks can be used to solve TSP instances from both QUBO input and autoencoders' hidden state representation. We finally generalize the approach and successfully train neural networks to solve arbitrary QUBO problems, sketching means to use neuromorphic hardware as a simulator or an additional co-processor for quantum computing.

Towards a Quantum Software Modeling Language

We set down the principles behind a modeling language for quantum software. We present a minimal set of extensions to the well-known Unified Modeling Language (UML) that allows it to effectively model quantum software. These extensions are separate and independent of UML as a whole. As such they can be used to extend any other software modeling language, or as a basis for a completely new language. We argue that these extensions are both necessary and sufficient to model, abstractly, any piece of quantum software. Finally, we provide a small set of examples that showcase the effectiveness of the extension set.

Quantum Annealing-Based Software Components: An Experimental Case Study with SAT Solving

Quantum computers have the potential of solving problems more efficiently than classical computers. While first commercial prototypes have become available, the performance of such machines in practical application is still subject to exploration. Quantum computers will not entirely replace classical machines, but serve as accelerators for specific problems. This necessitates integrating quantum computational primitives into existing applications.

In this paper, we perform a case study on how to augment existing software with quantum computational primitives for the Boolean satisfiability problem (SAT) implemented using a quantum annealer (QA). We discuss relevant quality measures for quantum components, and show that mathematically equivalent, but structurally different ways of transforming SAT to a QA can lead to substantial differences regarding these qualities. We argue that engineers need to be aware that (and which) details, although they may be less relevant in traditional software engineering, require considerable attention in quantum computing.

Making Quantum Computing Open: Lessons from Open Source Projects

Quantum computing (QC) is an emerging computing paradigm with the potential to revolutionize the field of computing. QC is a field that is quickly developing globally and has high barriers of entry. In this paper, we explore both successful contributors to the field as well as the wider QC community with the goal of understanding the backgrounds and training that helped them succeed. We gather data on 148 contributors to open source quantum computing projects hosted on GitHub and survey 46 members of QC community. Our findings show that QC practitioners and enthusiasts have diverse backgrounds, with most of them having a PhD and training in physics or computer science. We observe a lack of educational resources on quantum computing. Our goal is to use these results to start a conversation on making quantum computing more open.

The Holy Grail of Quantum Artificial Intelligence: Major Challenges in Accelerating the Machine Learning Pipeline

We discuss the synergetic connection between quantum computing and artificial intelligence. After surveying current approaches to quantum artificial intelligence and relating them to a formal model for machine learning processes, we deduce four major challenges for the future of quantum artificial intelligence: (i) Replace iterative training with faster quantum algorithms, (ii) distill the experience of larger amounts of data into the training process, (iii) allow quantum and classical components to be easily combined and exchanged, and (iv) build tools to thoroughly analyze whether observed benefits really stem from quantum properties of the algorithm.

WORKSHOP SESSION: 8th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE)

Dialogue Act Classification for Virtual Agents for Software Engineers during Debugging

A "dialogue act" is a written or spoken action during a conversation. Dialogue acts are usually only a few words long, and are often categorized by researchers into a relatively small set of dialogue act types, such as eliciting information, expressing an opinion, or making a greeting. Research interest into automatic classification of dialogue acts has grown recently due to the proliferation of Virtual Agents (VA) e.g. Siri, Cortana, Alexa. But unfortunately, the gains made into VA development in one domain are generally not applicable to other domains, since the composition of dialogue acts differs in different conversations. In this paper, we target the problem of dialogue act classification for a VA for software engineers repairing bugs. A problem in the SE domain is that very little sample data exists - the only public dataset is a recently-released Wizard of Oz study with 30 conversations. Therefore, we present a transfer-learning technique to learn on a much larger dataset for general business conversations, and apply the knowledge to the SE dataset. In an experiment, we observe between 8% and 20% improvement over two key baselines.

On the Relevance of Cross-project Learning with Nearest Neighbours for Commit Message Generation

Commit messages play an important role in software maintenance and evolution. Nonetheless, developers often do not produce high-quality messages. A number of commit message generation methods have been proposed in recent years to address this problem. Some of these methods are based on neural machine translation (NMT) techniques. Studies show that the nearest neighbor algorithm (NNGen) outperforms existing NMT-based methods, although NNGen is simpler and faster than NMT. In this paper, we show that NNGen does not take advantage of cross-project learning in the majority of the cases. We also show that there is an even simpler and faster variation of the existing NNGen method which outperforms it in terms of the BLEU_4 score without using cross-project learning.

Improving Code Recommendations by Combining Neural and Classical Machine Learning Approaches

Code recommendation systems for software engineering are designed to accelerate the development of large software projects. A classical example is code completion or next token prediction offered by modern integrated development environments. A particular challenging case for such systems are dynamic languages like Python due to limited type information at editing time. Recently, researchers proposed machine learning approaches to address this challenge. In particular, the Probabilistic Higher Order Grammar technique (Bielik et al., ICML 2016) uses a grammar-based approach with a classical machine learning schema to exploit local context. A method by Li et al., (IJCAI 2018) uses deep learning methods, in detail a Recurrent Neural Network coupled with a Pointer Network. We compare these two approaches quantitatively on a large corpus of Python files from GitHub. We also propose a combination of both approaches, where a neural network decides which schema to use for each prediction. The proposed method achieves a slightly better accuracy than either of the systems alone. This demonstrates the potential of ensemble-like methods for code completion and recommendation tasks in dynamically typed languages.

Oracle Issues in Machine Learning and Where to Find Them

The rise in popularity of machine learning (ML), and deep learning in particular, has both led to optimism about achievements of artificial intelligence, as well as concerns about possible weaknesses and vulnerabilities of ML pipelines. Within the software engineering community, this has led to a considerable body of work on ML testing techniques, including white- and black-box testing for ML models. This means the oracle problem needs to be addressed. For supervised ML applications, oracle information is indeed available in the form of dataset 'ground truth', that encodes input data with corresponding desired output labels. However, while ground truth forms a gold standard, there still is no guarantee it is truly correct. Indeed, syntactic, semantic, and conceptual framing issues in the oracle may negatively affect the ML system's integrity. While syntactic issues may automatically be verified and corrected, the higher-level issues traditionally need human judgment and manual analysis. In this paper, we employ two heuristics based on information entropy and semantic analysis on well-known computer vision models and benchmark data from ImageNet. The heuristics are used to semi-automatically uncover potential higher-level issues in (i) the label taxonomy used to define the ground truth oracle (labels), and (ii) data encoding and representation. In doing this, beyond existing ML testing efforts, we illustrate the need for software engineering strategies that especially target and assess the oracle.

Predicting Stack Overflow Question Tags: A Multi-Class, Multi-Label Classification

This work proposes to predict the tags assigned for the posts on Stack Overflow platform. The raw data was obtained from the stackexchange.com including more than 50K posts and their associated tags given by the users. The posts' questions and titles are pre-processed, and the sentences in the posts are further transformed into features via Latent Dirichlet Allocation. The problem is a multi-class and multi-label classification and hence, we propose 1) one-against-all models for 15 most popularly used tags, and 2) a combined multi-tag classifier for finding the top K tags for a single post. Three algorithms are used to train the one-against-all classifiers to decide to what extent a post belongs to a tag. The probabilities of each post belonging to a tag are then combined to give the results of the multi-tag classifier with the best performing algorithm. The performance is compared with a baseline approach (kNN). Our multi-tag classifier achieves 55% recall and 39% F1-score.

On Building an Automatic Identification of Country-Specific Feature Requests in Mobile App Reviews: Possibilities and Challenges

Mobile app stores are available in over 150 countries, allowing users from all over the world to leave public reviews of downloaded apps. Previous studies have shown that such reviews can serve as sources of requirements and suggested that users from different countries have different needs and expectations regarding the same app. However, the tremendous quantity of reviews from multiple countries, as well as several other factors, complicates identifying country-specific app feature requests. In this work, we present a simple approach to address this through NLP-based analysis and discuss some of the challenges involved in using the NLP-based analysis for this task.

Human-AI Partnerships for Chaos Engineering

Chaos Engineering refers to the practice of introducing faults in a system and observe the extent to which the system remains fault tolerant. However, is randomization the best approach to expose faults within a system? We aim to answer this question by introducing Chaos into different software architecture patterns and demonstrate how a back-end system can be made fault tolerant through artificial intelligence (AT). This paper discusses what aspects of AI would be used to make a system more resilient to perturbations and the results of these findings against existing chaos engineering approaches.

WORKSHOP SESSION: 6th International Workshop on Rapid Continuous Software Engineering (RCoSE)

Automating Continuous Planning in SAFe

The Scaled Agile Framework (SAFe) is a popular realisation of the agile methodology for large organisations. It is widely adopted but challenging to implement. We describe a new tool which automates aspects of the SAFe PI Planning process to enable continuous planning and facilitate collaboration between remote teams.

Platform Teams: An Organizational Structure for Continuous Delivery

Software-producing organizations are seeking to release faster and more efficiently new versions of their products to their customers to remain competitive in the fierce software market. Continuous delivery practices arise as a potential solution since every commit to the repository could result in a production-candidate version of a product, accelerating time to market, and improving customer satisfaction. In this work, we employed Grounded Theory to investigate how organizations pursuing continuous delivery should organize their development and operations teams. We collected data from 27 IT professionals. After a careful analysis, we started the elaboration of a taxonomy with four patterns of organizational structures: (1) siloed departments, (2) classical DevOps, (3) cross-functional teams, and (4) platform teams. We observed that the platform team structure is the most distinctive classification of our taxonomy, and it has promising results regarding delivery performance. Some relevant aspects we found out about platform teams include: infrastructure specialists need coding skills; product teams have to operate their business services; and much of the non-functional concerns are handled by the platform, alleviating product teams.

Challenges and Benefits from Using Software Analytics in Softeam

In this industry abstract, we describe the challenges and benefits of collecting feedback from customers and systems to support development cycles. In Softeam, we have performed such collection and support in four iterations by means of a software analytics platform. We describe the encountered challenges and the effects of suggested recommendations to improve the software quality of our systems on the metrics of interest.

WORKSHOP SESSION: 13th International Workshop on Search-Based Software Testing (SBST)

Fitness Guided Vulnerability Detection with Greybox Fuzzing

Greybox fuzzing is an automated test-input generation technique that aims to uncover program errors by searching for bug-inducing inputs using a fitness-guided search process. Existing fuzzing approaches are primarily coverage-based. That is, they regard a test input that covers a new region of code as being fit to be retained. However, a vulnerability at a program location may not get exhibited in every execution that happens to visit to this program location; only certain program executions that lead to the location may expose the vulnerability. In this paper, we introduce a unified fitness metric called headroom, which can be used within greybox fuzzers, and which is explicitly oriented towards searching for test inputs that come closer to exposing vulnerabilities.

We have implemented our approach by enhancing AFL, which is a production quality fuzzing tool. We have instantiated our approach to detecting buffer overrun as well as integer-overflow vulnerabilities. We have evaluated our approach on a suite of benchmark programs, and compared it with AFL, as well as a recent extension over AFL called AFLGo. Our approach could uncover more number of vulnerabilities in a given amount of fuzzing time and also uncover the vulnerabilities faster than these two tools.

SINVAD: Search-based Image Space Navigation for DNN Image Classifier Test Input Generation

The testing of Deep Neural Networks (DNNs) has become increasingly important as DNNs are widely adopted by safety critical systems. While many test adequacy criteria have been suggested, automated test input generation for many types of DNNs remains a challenge because the raw input space is too large to randomly sample or to navigate and search for plausible inputs. Consequently, current testing techniques for DNNs depend on small local perturbations to existing inputs, based on the metamorphic testing principle. We propose new ways to search not over the entire image space, but rather over a plausible input space that resembles the true training distribution. This space is constructed using Variational Autoencoders (VAEs), and navigated through their latent vector space. We show that this space helps efficiently produce test inputs that can reveal information about the robustness of DNNs when dealing with realistic tests, opening the field to meaningful exploration through the space of highly structured images.

Double Cycle Hybrid Testing of Hybrid Distributed IoT System

Testing heterogeneous IoT applications such as a home automation systems integrating a variety of devices poses serious challenges. Oftentimes requirements are vaguely defined. Consumer grade cyber-physical devices and software may not meet the reliability and quality standard needed. Plus, system behavior may partially depend on various environmental conditions. For example, WI-FI congestion may cause packet delay; meanwhile cold weather may cause an unexpected drop of inside temperature.

We surmise that generating and executing failure exposing scenarios is especially challenging. Modeling phenomenons such as network traffic or weather conditions is complex. One possible solution is to rely on machine learning models approximating the reality. These models, integrated in a system model, can be used to define surrogate models and fitness functions to steer the search in the direction of failure inducing scenarios.

However, these models also should be validated. Therefore, there should be a double loop co-evolution between machine learned surrogate models functions and fitness functions.

Overall, we argue that in such complex cyber-physical systems, co-evolution and multi-hybrid approaches are needed.

Finding Load Inducing Test Scenarios Using Genetic Algorithms and Tree Based Encoding

Load test is conducted in order to gain an insight to the characteristics of a system under various amount of load. Since the combination of possible actions a user can follow from start to finish is possibly endless, the possibility of missing a load inducing scenario by using a traditional load testing software is highly probable. In this work, we implement a rule-aided scenario generation algorithm and find the possible scenarios that a high amount of load is generated by using genetic algorithms to drive the search forward.

Flexible Probabilistic Modeling for Search Based Test Data Generation

While Search-Based Software Testing (SBST) has improved significantly in the last decade we propose that more flexible, probabilistic models can be leveraged to improve it further. Rather than searching for an individual, or even sets of, test case(s) or datum(s) that fulfil specific needs the goal can be to learn a generative model tuned to output a useful family of values. Such generative models can naturally be decomposed into a structured generator and a probabilistic model that determines how to make non-deterministic choices during generation. While the former constrains the generation process to produce valid values the latter allows learning and tuning to specific goals. SBST techniques differ in their level of integration of the two but, regardless of how close it is, we argue that the flexibility and power of the probabilistic model will be a main determinant of success. In this short paper, we present how some existing SBST techniques can be viewed from this perspective and then propose additional techniques for flexible generative modelling the community should consider. In particular, Probabilistic Programming languages (PPLs) and Genetic Programming (GP) should be investigated since they allow for very flexible probabilistic modelling. Benefits could range from utilising the multiple program executions that SBST techniques typically require to allowing the encoding of high-level test strategies.

Generating API Test Data Using Deep Reinforcement Learning

Testing is critical to ensure the quality of widely-used web APIs. Automatic test data generation can help to reduce cost and improve overall effectiveness. This is commonly accomplished by using the powerful concept of search-based software testing (SBST). However, with web APIs growing larger and larger, SBST techniques face scalability challenges. This paper introduces a novel SBST based approach for generating API test data using deep reinforcement learning (DRL) as the search algorithm. By exploring the benefits of DRL in the context of scalable API test data generation, we show its potential as alternative to traditional search algorithms.

Java Unit Testing Tool Competition: Eighth Round

We report on the results of the eighth edition of the Java unit testing tool competition. This year, two tools, EvoSuite and Randoop, were executed on a benchmark with (i) new classes under test, selected from open-source software projects, and (ii) the set of classes from one project considered in the previous edition. We relied on an updated infrastructure for the execution of the different tools and the subsequent coverage and mutation analysis based on Docker containers. We considered two different time budgets for test case generation: one an three minutes. This paper describes our methodology and statistical analysis of the results, presents the results achieved by the contestant tools and highlights the challenges we faced during the competition.

EvoSuite at the SBST 2020 Tool Competition

EvoSuite is a search-based tool that automatically generates executable unit tests for Java code (JUnit tests). This paper summarizes the results and experiences of EvoSuite's participation at the eighth unit testing competition at SBST 2020, where EvoSuite achieved the highest overall score (406.14 points) for the seventh time in eight editions of the competition.

WORKSHOP SESSION: 5th International Workshop on Emotion Awareness in Software Engineering (SEmotion)

Chat activity is a better predictor than chat sentiment on software developers productivity

Recent works have proposed that software developers' positive emotion has a positive impact on software developers' productivity. In this paper we investigate two data sources: developers chat messages (from Slack and Hipchat) and source code commits of a single co-located Agile team over 200 working days. Our regression analysis shows that the number of chat messages is the best predictor and predicts productivity measured both in the number of commits and lines of code with R2 of 0.33 and 0.27 respectively. We then add sentiment analysis variables until AIC of our model no longer improves and gets R2 values of 0.37 (commits) and 0.30 (lines of code). Thus, analyzing chat sentiment improves productivity prediction over chat activity alone but the difference is not massive. This work supports the idea that emotional state and productivity are linked in software development. We find that three positive sentiment metrics, but surprisingly also one negative sentiment metric is associated with higher productivity.

Do You Just Discuss or Do You Solve?: Meeting Analysis in a Software Project at Early Stages

Software development is a very cooperative and communicative task. In most software projects, meetings are a very important medium to share information. However, these meetings are often not as effective as expected. One big issue hindering productive and satisfying meetings is inappropriate behavior such as complaining. In particular, talking about problems without at least trying to solve them decreases motivation and mood of the team.

Interaction analyses in meetings allow the assessment of appropriate and inappropriate behavior influencing the quality of a meeting. Derived from an established interaction analysis coding scheme in psychology, we present act4teams-short which allows real-time coding of meetings in software projects. We apply act4teams-short in an industrial case study at Volkswagen Commercial Vehicles, a large German company in the automotive domain. We analyze ten team-internal meetings at early project stages. Our results reveal difficulties due to missing project structure and the overall project goal. Furthermore, the team has an intrinsic interest in identifying problems and solving them, without any extrinsic input being required.

Understanding Implicit User Feedback from Multisensorial and Physiological Data: A case study

Ensuring the quality of user experience is very important for increasing the acceptance likelihood of software applications, which can be affected by several contextual factors that continuously change over time (e.g., emotional state of end-user). Due to these changes in the context, software continually needs to adapt for delivering software services that can satisfy user needs. However, to achieve this adaptation, it is important to gather and understand the user feedback. In this paper, we mainly investigate whether physiological data can be considered and used as a form of implicit user feedback. To this end, we conducted a case study involving a tourist traveling abroad, who used a wearable device for monitoring his physiological data, and a smartphone with a mobile app for reminding him to take his medication on time during four days. Through the case study, we were able to identify some factors and activities as emotional triggers, which were used for understanding the user context. Our results highlight the importance of having a context analyzer, which can help the system to determine whether the detected stress could be considered as actionable and consequently as implicit user feedback.

Mapping human values and scrum roles: a study on students' preferences

Despite the long tradition on the study of human values, the impact of this field in the software engineering domain is rarely studied. To these regards, this study focuses on applying human values to agile software development process, more specifically to scrum roles. Thus, the goal of the study is to explore possible associations between human values and scrum roles preferences among students. Questionnaires are designed by employing the Short Schwartz's Value Survey and are distributed among 57 students. The results of the quantitative analysis process consisting of descriptive statistics, linear regression models and Pearson correlation coefficients, revealed that values such as power and self-direction influence the preference for the product owner role, the value of hedonism influences the preference for scrum masters and self-direction is associated with team members' preference.

Research Idea on How Language and Symbols (Semantics and Semiotics) Affect Emotions of Software Engineers

This is essentially a 'call for research' and collaboration between industry and academia to improve the motivation and performance of software engineers through use of language, words and symbols.

How languages and symbols shape the way people think, feel and behave has been a topic of wide research. Words have powerful association with perception and cognition and throughout history, language has been used as a medium for influencing minds and for mass propaganda. While this is widely understood in politics, psychology and sociology, very little research has been to study the implicit and explicit impact of words, phrases and language on the way software engineers think, feel, behave and perform. While software engineering could be seen as a science that lends itself to a formal process and methods, it can also be seen as a craft and art which needs imagination and creativity which in turn are influenced by emotions. We propose some hypotheses, research questions and ideas to trigger formal studies of deeper connections between language/ symbols and software engineers' performance. We also draw inspiration from a wide body of research already conducted in this area which have influenced the field of psychology, sociology and mass communication. This is essentially a 'call for research' and collaboration between industry and academia to improve the motivation and performance of software engineers through use of language, words and symbols.

How Do Negative Emotions Influence on the Conceptual Models Verification?: A live study proposal

The present live study is proposed with the objective of investigating the influence of negative emotions (i.e., stress) in the efficiency for verifying conceptual models. To conduct this study, we use a Model-driven Testing tool, named CoSTest, and our own version of stress detector within a competition setting. The experiment design, overview of the empirical procedure, instrumentation and potential threats are presented in the proposal.

WORKSHOP SESSION: 7th International Workshop on Software Engineering Research and Industrial Practice (SER&IP)

Towards a Topology for Legacy System Migration

Dealing with legacy systems is a decade old industry challenge. The pressure to efficiently modernise legacy both to meet new business requirements and to mitigate inherent risks is ever growing. Our experience shows a lack of collaboration between researchers and practitioners inhibiting innovation in the field. To facilitate communication between academia and industry and as a byproduct to obtain an up to date picture of the state of affairs we are creating a legacy system migration topology based on generalisations from a multi-case study as well as extensive literature research. We expect the topology to be useful in connecting industry needs and challenges with current and potential future research and to improve bidirectional accessibility.

Applying probabilistic models to C++ code on an industrial scale

Machine learning approaches are widely applied to different research tasks of software engineering, but C/C++ code presents a challenge for these approaches because of its complex build system. However, C and C++ languages still remain two of the most popular programming languages, especially in industrial software, where a big amount of legacy code is still used. This fact prevents the application of recent advances in probabilistic modeling of source code to the C/C++ domain.

We demonstrate that it is possible to at least partially overcome these difficulties by the use of a simple token-based representation of C/C++ code that can be used as a possible replacement for more precise representations. Enriched token representation is verified at a large scale to ensure that its precision is good enough to learn rules from.

We consider two different tasks as an application of this representation: coding style detection and API usage anomaly detection. We apply simple probabilistic models to these tasks and demonstrate that even complex coding style rules and API usage patterns can be detected by the means of this representation.

This paper provides a vision of how different research ML-based methods for software engineering could be applied to the domain of C/C++ languages and show how they can be applied to the source code of a large software company like Samsung.

Automated Software Quality Monitoring in Research Collaboration Projects

In collaborative research projects, both researchers and practitioners work together solving business-critical challenges. These projects often deal with ETL processes, in which humans extract information from non-machine-readable documents by hand. AI-based machine learning models can help to solve this problem.

Since machine learning approaches are not deterministic, their quality of output may decrease over time. This fact leads to an overall quality loss of the application which embeds machine learning models. Hence, the software qualities in development and production may differ.

Machine learning models are black boxes. That makes practitioners skeptical and increases the inhibition threshold for early productive use of research prototypes. Continuous monitoring of software quality in production offers an early response capability on quality loss and encourages the use of machine learning approaches. Furthermore, experts have to ensure that they integrate possible new inputs into the model training as quickly as possible.

In this paper, we introduce an architecture pattern with a reference implementation that extends the concept of Metrics Driven Research Collaboration with an automated software quality monitoring in productive use and a possibility to auto-generate new test data coming from processed documents in production.

Through automated monitoring of the software quality and auto-generated test data, this approach ensures that the software quality meets and keeps requested thresholds in productive use, even during further continuous deployment and changing input data.

Generating Real-World Impact from Academic Research: Experience Report from a University Impact Hub

This paper presents an experience report of Digital Creativity Labs (DC Labs), an 'impact hub' created at the University of York in the UK. The impact hub is dedicated to fostering impactful collaborations between practitioners and researchers in the world of games, interactive media and the rich space in which these converge. In this paper we describe how the impact hub works and the activities undertaken to build a culture of academic entrepreneurship that allows academic researchers to understand the goals of external partners and align with them. We also present some illustrative case studies before proposing initial lessons learned from experiences of the Lab. Multi-disciplinary academic teams can generate excellent impact, but this doesn't happen automatically. A culture of entrepreneurship is needed, and opportunities must be created for researchers to tackle problems jointly. Effort must be put into maintaining collaborations with partners.

Centralized Generic Interfaces in Hardware/Software Co-design for AI Accelerators

A hardware/software co-design for AI accelerators such as Neural Processing Unit (NPU) is essential not only to support the required functionality but also to meet primary goals of improved performance and power efficiency. However, their ever-changing requirements often introduce undesirable development costs. Indeed, it is quite challenging for developers from different backgrounds to efficiently work together to construct a full HW/SW stack to develop AI accelerators.

This paper addresses these challenges, and proposes a centralized collaboration methodology for efficient full-stack development, especially targeting NPU HW. The proposal is inspired based on the observations from our experiences, presented later as a case study. As not all of the involved developers have enough knowledge of software engineering, this approach suggests making a central development group (e.g., runtime system software) have a higher priority to organize and devise common interfaces including APIs for each layer in the full-stack. This aims to minimize unnecessary discussions between development groups and hide any minor updates introduced with each new design, reducing the overall development costs and improving the quality of products. More importantly, each development group can focus on their work as much as possible with this approach.

Creation of an Wearable Startup: From a Laboratory Incubator to a Revenue Generating Business

The need to understand signals given by our own body is of great interest to most human beings. This quest for self-knowledge is both shared by academic researchers and businesses who want to bring value to consumers in the society. This paper presents a story of how a software engineering researcher who collaborated with hardware engineers and entrepreneurs in an incubator, Simula Garage, hosted by Simula Research Laboratory to create a wearable startup called Sweetzpot. Sweetzpot developed a respiratory inductance plethysomography sensor called Flow to measure breathing signals from ribcage and/or abdominal movements. The team grew to consist of software engineers, students of machine learning and physics, an industrial/interaction designer, a hardware engineer, a lawyer, and an accountant in addition to external collaborators. We present the sequence of events that led to creation and sustainability of the startup and summarize the lessons learnt from it.

WORKSHOP SESSION: 2nd International Workshop on Software Engineering Research & Practices for the Internet of Things (SERP4IoT)

A Testbed for Hardware-assisted Online Profiling of IoT devices

The widespread application of the Internet of Things (IoT) has put forward higher requirements for the reliability of the IoT devices. Traditional testing methods, while able to get a rough approximation of the performance of IoT devices, often fail to extract detailed runtime execution traces of applications from the resource-constrained devices. Hardware-assisted tracing can make it easier for IoT developers to obtain the rich running information of IoT devices with limited overhead, which brings new possibility to fully evaluate the software (i.e. firmware) of the IoT. For the hardware-assisted tracing data, existing offline analysis methods have severely limited the observable time span due to a large amount of tracing data. In this paper, we propose an FPGA-based online profiling testbed to carry out real-time processing of tracing data and implement continuous observation of IoT devices. Our preliminary experiment shows that the testbed has superior performance in terms of runtime trace capturing and sampling frequency. Finally, the potential applications of the captured traces are discussed.

On the Engineering of IoT-Intensive Digital Twin Software Systems

Digital Twins (DT) are software systems representing different aspects of a physical or conceptual counterpart---the real twin, which is instrumented with several sensors or computing devices that generate, consume and transfer data to its DT with different purposes. In other words, DT systems are, to a large extent, IoT-intensive systems. Indeed, by exploiting and managing IoT data, artificial intelligence, and big data and simulation capabilities, DTs have emerged as a promising approach to manage the virtual manifestation of real-world entities throughout their entire lifecycle. Their proliferation will contribute to realizing the long-craved convergence of virtual and physical spaces to augment things and human capabilities. In this context, despite the proposal of noteworthy contributions, we argue that DTs have not been sufficiently investigated from a software engineering perspective. To address this, in this paper we propose GEMINIS, an architectural reference model that adopts self-adaptation, control, and model-driven engineering techniques to specify the structural and behavioural aspects of DTs and enable the evolution of their internal models. Moreover, we introduce an approach for engineering IoT-intensive Digital Twin Software Systems (DTSS) using GEMINIS' capabilities to deal with uncertain conditions that are inherent to the nature of mirrored physical environments and that might compromise the fidelity of a DT. With GEMINIS and the proposed approach, we aim to advance the engineering of DTSS as well as IoT and cyber-physical systems by providing practitioners with guidelines to model and specify inherent structural and behavioural characteristics of DTs, addressing common design concerns.

Platform-specific Modeling for RIOT based IoT Systems

The variety of smart devices and their communication models increase the development complexity of embedded software for IoT. Thus, the development of these systems becomes more error-prone, complex, and costly. To tackle this problem, in this study, a model-driven approach is proposed for the development of RIOT-OS based IoT systems. To this end, a meta-model is designed for RIOT-OS. Based on this meta-model, a Domain-specific Modeling Language (DSML) is developed to graphically represent the domain models. To gain more functionality for the language, domain rules are defined as constraints. Also, system codes are generated partially from the instance models. In this way, the development is supported by code synthesis and the number of bugs is reduced. Finally, a smart irrigation system and a smart lighting system are implemented to evaluate the proposed DSML. The results show that about 83.5% of the final code is generated automatically on average.

A Preliminary Systematic Mapping on Software Engineering for Robotic Systems: A Software Quality Perspective

Robotic systems have been increasingly employed in everyday tasks. Considering that software plays a crucial point in robot systems, to investigate how software engineering concepts in a software quality perspective can improve robotic systems. In this work, we present a systematic mapping to identify and classify the state-of-art of software engineering for robotic systems in a quality software perspective. We selected and systematically analyzed a final set of 35 primary studies extracted from an automated search on Scopus digital library.

This work presents three main contributions. Firstly, we organize a catalogue of research studies about software engineering, more specifically software quality applied in robotic systems. Next, we systematically analyze software quality areas used in robotic systems. Finally, we discuss insights into research opportunities and gaps in software engineering to robotic systems for future studies.

As a result, we observed that there are studies in the robotic systems area, addressing in a combined way, software engineering approaches and software quality aspects. The less investigated software quality aspect is security. Due to this fact, we presented an overview of the state-of-art on blockchain applying in robotics systems. Blockchain brings opportunities for changing the ways that robots interact with humans. Finally, we identify research opportunities and gaps in software quality on robotic systems, presenting an overview for future studies.

Improving Engagement Assessment in Gameplay Testing Sessions using IoT Sensors

The video game industry is a multimillionaire market, which makes solo indie developers millionaire in one day. However, success in the game industry is not a coincidence. Video game development is an unusual kind of software that mix multidisciplinary teams: software engineers, designers, and artists. Also, for a video game to become popular, it must be fun and polished: exhaustively well tested. Testing in video game development encompasses different types of tests at different moments of the development process. In particular, assessing the players' gameplay in a test session can drive the development drastically. The designers analyze the players' actions and behaviour in the game. They can then decide if a feature/level requires rework. They often spend many man/work hours reworking a feature just because it is not engaging. As the designers (usually) assess the gameplay session by hand, they cannot be sure that a specific feature is engaging enough. They would benefit from meaningful data that would help them better assess the gameplay and take the decision to keep, rework, or remove a feature. Consequently, we describe the need for an IoT framework to assess players' gameplay using IoT sensors together with game devices which will produce a rich output for the game designers.

A User-friendly Approach to Write and Enforce Rules for Detecting Anomalous Network Traffic in IoT Environments

Enforcing security on IoT devices is not an easy task, due to several vulnerabilities in many products that reach consumer shelves. With the rapid growth of the IoT market in the recent past there are specific network attacks targeting IoT devices, thus it is paramount to create mechanisms aiming this niche. Network Intrusion Detection Systems (NIDS, or IDS for short) can be used to employ defenses and detect anomalous traffic on IoT networks. However, due to the nature of these tools and the typical sysdamin users they target, usability is not one of the main concerns, with tools usually available through console and also demanding very specific network knowledge market. Since a large share of the IoT market is represented by consumers on Smart Home contexts, usability must be treated as a crucial feature on IDS systems that target IoT environments. We present a user-friendly approach that helps writing rules to enforce the detection of anomalous behavior on network traffic in IoT networks. This approach was applied in our platform that works as an IDS system monitoring network traffic that continuously applies rules programmed by its users or administrators.

Architecting Blockchain Systems: A Systematic Literature Review

Companies are gravitating more and more towards the use of blockchains in their systems, but it is not a silver bullet. Challenges are currently holding back blockchain's enormous potential, such as scalability issues and frustrating trade-offs, most notably in public decentralized blockchain systems. In this paper, we conduct a Systematic Review of Literature in order to explore the current challenges of blockchains while presenting possible solutions to each of these challenges. We conclude that current challenges can be summarized in three categories: Scalability issues, security issues and a choice of the consensus protocol. We also briefly discuss the use of blockchain in current systems, concluding that while blockchains current immaturity makes it hard to recommend for most projects, blockchains in their current state could be used in the Internet of Things.

Digital Twin for Cybersecurity Incident Prediction: A Multivocal Literature Review

The advancements in the field of internet of things, artificial intelligence, machine learning, and data analytics has laid the path to the evolution of digital twin technology. The digital twin is a high-fidelity digital model of a physical system or asset that can be used e.g. to optimize operations and predict faults of the physical system. To understand different use cases of digital twin and its potential for cybersecurity incident prediction, we have performed a Systematic Literature Review (SLR). In this paper, we summarize the definition of digital twin and state-of-the-art on the development of digital twin including reported work on the usability of a digital twin for cybersecurity. Existing tools and technologies for developing digital twin is discussed.

A preliminary study of open-source IoT development frameworks

The Internet of Things (IoT) market is growing fast with an increasing number of connected devices. This led many software companies to shift their focus to develop and provide IoT solutions. IoT development has its own challenges as typical IoT solutions are composed of heterogeneous devices, protocols and software. To cope with these challenges, many frameworks are available to help developers to build IoT applications. Some of these frameworks are open source and might be of great interest for small and medium-sized companies wishing to build IoT solutions at a lower cost. In this paper, we present the results of a preliminary study of four open source IoT development frameworks. In particular, we used these frameworks to implement a sample of three IoT applications and we analyze them against a minimal set of IoT requirements. We focus in our study on the IoT development for Raspberry PI as it is a very low-cost and popular platform.

WORKSHOP SESSION: 3rd International Workshop on Software Health (SoHeal)

Towards A Dependency-Driven Taxonomy of Software Types

Context: The evidence on software health and ecosystems could be improved if there was a systematic way to identify the types of software for which empirical evidence applies. Results and guidelines on software health are unlikely to be globally applicable: the context and the domain where the evidence has been tested are more likely to influence the results on software maintenance and health.

Objective: The objectives of this paper are (i) to discuss the implications of adopting a specific taxonomy of software types, and (ii) to define, where possible, dependencies or similarities between parts of the taxonomy.

Method: We discuss bottom-up and top-down taxonomies, and we show how different taxonomies fare against each other. We also propose two case studies, based on software projects divided in categories and sub-categories.

Results: We show that one taxonomy does not consistently represent another taxonomy's categories. We also show that it is possible to establish directional dependencies (e.g., 'larger than') between attributes of different categories, and sub-categories.

Conclusion: This paper establishes the need of directional-driven dependencies between categories of software types, that have an immediate effect on their maintenance and their relative software health.

How Magic Is Zero?: An Empirical Analysis of Initial Development Releases in Three Software Package Distributions

Distributions of open source software packages dedicated to specific programming languages facilitate software development by allowing software projects to depend on the functionality provided by such reusable packages. The health of a software project can be affected by the maturity of the packages on which it depends. The version numbers of the used package releases provide an indication of their maturity. Packages with a 0.y.z version number are commonly assumed to be under initial development, implying that they are likely to be less stable, and depending on them may be less healthy.

In this paper, we empirically study, for three open source package distributions (Cargo, npm and Packagist) to which extent 0.y.z package releases and ≥1.0.0 package releases behave differently. More specifically, we quantify the prevalence of 0.y.z releases, we explore how long packages remain in the initial development stage, we compare the update frequency of 0.y.z and ≥ 1.0.0 package releases, we study how often 0.y.z releases are required by other packages, and we assess whether semantic versioning is respected for dependencies towards them. Among others, we observe that package distributions are more permissive than what semantic versioning dictates for 0.y.z releases, and that many of the 0.y.z releases can be regarded as mature packages that are no longer under initial development. As a consequence, the version number does not provide a good indication of the health of a package release.

Splicing Community Patterns and Smells: A Preliminary Study

Software engineering projects are now more than ever a community effort. In the recent past, researchers have shown that their success may not only depend on source code quality, but also on other aspects like the balance of distance, culture, global engineering practices, and more. In such a scenario, understanding the characteristics of the community around a project and foresee possible problems may be the key to develop successful systems. In this paper, we focus on this research problem and propose an exploratory study on the relation between community patterns, i.e., recurrent mixes of organizational or social structure types, and smells, i.e., sub-optimal patterns across the organizational structure of a software development community that may be precursors of some sort of social debt. We exploit association rule mining to discover frequent relations between them. Our findings show that different organizational patterns are connected to different forms of socio-technical problems, possibly suggesting that practitioners should put in place specific preventive actions aimed at avoiding the emergence of community smells depending on the organization of the project.

A first look at an emerging model of community organizations for the long-term maintenance of ecosystems' packages

One of the biggest strength of many modern programming languages is their rich open source package ecosystem. Indeed, modern language-specific package managers have made it much easier to share reusable code and depend on components written by someone else (often by total strangers). However, while they make programmers more productive, such practices create new health risks at the level of the ecosystem: when a heavily-used package stops being maintained, all the projects that depend on it are threatened. In this paper, I ask three questions. RQ1: How prevalent is this threat? In particular, how many depended-upon packages are maintained by a single person (who can drop out at any time)? I show that this is the case for a significant proportion of such packages. RQ2: How can project authors that depend on a package react to its maintainer becoming unavailable? I list a few options, and I focus in particular on the notion of fork. RQ3: How can the programmers of an ecosystem react collectively to such events, or prepare for them? I give a first look at an emerging model of community organizations for the long-term maintenance of packages, that appeared in several ecosystems.

A Framework for Software Health Management using Bayesian Statistics: Position Paper

As size and complexity of safety-critical software systems increase, Software Health Management (SWHM) must make sure that the software always remains in safe and healthy regions of the state space. Boundaries between healthy and unhealthy regions are important for the detection of violations and health management. In this position paper, we present a framework, which employs techniques from Bayesian statistical modeling and active learning to efficiently characterize health boundaries in high-dimensional spaces. We will discuss, how this framework supports SWHM during design time and during operation of learning/adapting software systems.

Cross-distribution Feedback in Software Ecosystems

Despite the proliferation of software ecosystems (SECOs), growing a sustainable and healthy SECO remains a significant challenge. One approach to mitigate this challenge is the utilization of a mechanism that collects feedback from distributors (distros) and end-users of the SECO releases. This presentation aims at investigating the effectiveness of the feedback mechanism implemented by OpenStack to address the needs of end-users and distros. I mined the OpenStack repositories and mapped 20 distros' bug-related activities. Results suggest that OpenStack releases are actively maintained for 18 months before reaching end-of-life (EOL), which makes coordination with distros difficult because distros usually provide services to their end-users for a period between 36 - 60 months before reaching EOL. Also, bugs are fixed faster by the distros (7 - 76 days) than the OpenStack community (average of 4 months). However, only 22% of the bugs addressed by OpenStack distros are pushed back upstream.

The Influence of Technical Variety in Software Ecosystems

There is a lack of empirical evidence on software ecosystem health metrics, and a need for operationalizable metrics that describe software ecosystem characteristics. This study unveils a new approach for measuring technical variety concisely. Studies show that a high variety opens up new opportunities and thus, better niche creation, and ultimately, improves software ecosystem health. Four different ecosystems are evaluated, and compared. Variety is measured in relation to robustness, and productivity metrics of the ecosystem to uncover the influence of technical variety on software ecosystems. Technical variety indicates a positive correlation with robustness, however acceptance of this statement is not confirmed with certainty due to a weak relation. Furthermore, significant relations indicate differences between ecosystem types.

Automatic library categorization

Software ecosystems contain several types of artefacts such as libraries, documentation and source code files. Recent studies show that the Maven software ecosystem alone already contains over 2.8 million artefacts and over 70, 000 libraries. Given the size of the ecosystem, selecting a library represents a challenge to its users.

The MVNRepository website offers a category-based search functionality as a solution. However, not all of the libraries have been categorised, which leads to incomplete search results. This work proposes an approach to the automatic categorisation of libraries through machine learning classifiers trained on class and method names. Our preliminary results show that the approach is accurate, suggesting that large-scale applications may be feasible.

Characterizing outdateness with technical lag: an exploratory study

Background: Nowadays, many applications are built reusing a large number of components, retrieved from software collections such as npm (JavaScript) or PyPi (Python). Those components are built in their corresponding upstream repositories, where they are being developed. This architecture of reusing causes some constraints on how much outdated is an application when it is deployed in production environments.

Goal: To understand how outdateness of applications, and the components on which they depend, can be computed, so that different situations can be measured and assessed with the help of metrics. Based on this understanding, we also want to produce a model to characterize ecosystems (collections of reusable components).

Method: Use the technical lag framework to analyze the flows from upstream repositories, to collection of components, to application building and later deployment. Using this framework, analyze lag in version availability in each of these stages, and constraints that set limits on how much outdated can be deployed applications.

Results: We define a model which allows us to better understand the factors that influence outdateness of an application produced with reusable components from repositories of components. The model allows us to find the factors for defining metrics for measuring outdateness, and to explore the factors that influence outdateness for components in applications. We propose some of those factors as the basis to characterize ecosystems or collections of components with respect to their impact on the outdateness of applications built with them.

Conclusions: Technical lag is an appropriate framework for studying lags in version propagation from upstream development to deployment.

Mutant Density: A Measure of Fault-Sensitive Complexity

Software code complexity is a well-studied property to determine software component health. However, the existing code complexity metrics do not directly take into account the fault-proneness aspect of the code. We propose a metric called mutant density where we use mutation as a method to introduce artificial faults in code, and count the number of possible mutations per line. We show how this metric can be used to perform helpful analysis of real-life software projects.

On the Variations and Evolutions of API Usage Patterns: Case Study on Android Applications

Software developers can reduce the implementation cost by calling already provided functions through accessing library Application Programming Interface (API). APIs are often used in combination but how to combine them are not well-documented. Existing researches focused on how to extract API usage patterns or how to detect API misuse from existing software. This kind of research might be affected by dataset to analyze, so to improve mining results and to understand how the difference of API usage patterns affect the software health are important tasks. We conducted an analysis on variations of API usage pattern among software projects and their version history with Android SDK APIs and Android applications. Based on our analysis results, we made some suggestions for further API analysis. For example, there are many project-specific API usage patterns and long-life uncommon API usage patterns so that they might affect the mining result or checking software health status.

WORKSHOP SESSION: 3rd International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB)

ADF-GA: Data Flow Criterion Based Test Case Generation for Ethereum Smart Contracts

Testing is an important technique to improve the quality of Ethereum smart contract programs. However, current work on testing smart contract only focus on static problems of smart contract programs. A data flow oriented test case generation approach for dynamic testing of smart contract programs is still missing. To address this problem, this paper proposes a novel test case generation approach, called ADF-GA (All-uses Data Flow criterion based test case generation using Genetic Algorithm), for Solidity based Ethereum smart contract programs. ADF-GA aims to efficiently generate a valid set of test cases via three stages. First, the corresponding program control flow graph is constructed from the source codes. Second, the generated control flow graph is analyzed to obtain the variable information in the Solidity programs, locate the require statements, and also get the definition-use pairs to be tested. Finally, a genetic algorithm is used to generate test cases, in which an improved fitness function is proposed to calculate the definition-use pairs coverage of each test case with program instrumentation. Experimental studies are performed on several representative Solidity programs. The results show that ADF-GA can effectively generate test cases, achieve better coverage, and reduce the number of iterations in genetic algorithm.

A Blockchain Oriented Software Application in the Revised Payments Service Directive context

The new European Payments Service Directive (Directive (EU) 2015/2366) introduces a novelty for users of online accounts: the possibility of accessing their own bank statements or making payment transactions directly through software created by Third Party Providers. The new players authorized by the directive represent the real novelty with respect to the previous one (Directive 2007/64/CE), and introduce for the first time a strong risk of disintegration between the Bank and its customers. New authorized parties include the Account Servicing Payment Service Provider, the Payment Initiation Service Provider and the Account Information Service Provider. This new mechanism for accessing information on personal bank statements or for the payment will stimulate a remodeling of the offers for customers. In this work a first attempt to implement a service of account information and a service of account storing through a blockchain oriented software application is presented.

An overview of blockchain-based systems and smart contracts for digital coupons

Among the accessory applications of the blockchain, the idea of using it as an immutable register for tracking and certifying documents is recently gaining interest in research and industry. The problems of traceability, non-counterfeiting and unique usage of digital coupons fall within this area; many couponing platforms are hence exploring the possibility of addressing the above limitations with blockchain technologies. In view of the foregoing, in this work we analyse and compare several blockchain-based couponing systems. To do so, we first propose a general schema of digital coupon and define the desirable properties of a couponing system. Then, we select a sample of these systems and we examine them, describing their design choices and summarizing their relevant properties. Finally, we inspect their code and study how the notion of couponing system is interpreted in their smart contracts. We also highlight their distinctive features and relevant implementation solutions. We conclude by discussing what emerged from our analysis and proposing some possible future investigations.

Investigation of Mutual-Influence among Blockchain Development Communities and Cryptocurrency Price Changes

This paper aims to identify and model relationships between cryptocurrencies market price changes and topic discussion occurrences on social media. The considered cryptocurrencies are the two highest in value at the moment, Bitcoin and Ethereum. At the same time, topics were realized through a classification of the comments gained from the Reddit social media platform, implementing a Hawkes model. The results highlight that it is possible to identify some interactions among the considered features, and it appears that some topics are indicative of certain types of price movements. Specifically, the discussions concerning issues about government, trading and Ethereum cryptocurrency as an exchange currency, appear to affect Bitcoin and Ethereum prices negatively. The discussions of investment appear to be indicative of price rises, while the discussions related to new decentralized realities and technological applications is indicative of price falls.