Software development projects, in particular open source ones, heavily rely on the use of tools to support, coordinate and promote development activities. Despite their paramount value, they contribute to fragment the project data, thus challenging practitioners and researchers willing to derive insightful analytics about software projects. In this demo we present Perceval, a loyal helper able to perform automatic and incremental data gathering from almost any tool related with contributing to open source development, among others, source code management, issue tracking systems, mailing lists, forums, and social media. Perceval is an industry strong free software tool that has been widely used in Bitergia, a company devoted to offer commercial software analytics of software projects. It hides the technical complexities related to data acquisition and eases the definition of analytics. A video showcasing the main features of Perceval can be found at https://youtu.be/eH1sYF0Hdc8.
Developer behavior in the IDE, including commands and events and complementing the active source code, provides useful context to in-IDE recommendation systems. This paper presents S
Video: http://bit.ly/sitfdemo
In software projects, technical debt metaphor is used to describe the situation where developers and managers have to accept compromises in long-term software quality to achieve short-term goals. There are many types of technical debt, and self-admitted technical debt (SATD) was proposed recently to consider debt that is introduced intentionally (e.g., through temporaryfi x) and admitted by developers themselves. Previous work has shown that SATD can be successfully detected using source code comments. However, most current state-of-the-art approaches identify SATD comments through pattern matching, which achieve high precision but very low recall. That means they may miss many SATD comments and are not practical enough. In this paper, we propose SATD Detector, a tool that is able to (i) automatically detect SATD comments using text mining and (ii) highlight, list and manage detected comments in an integrated development environment (IDE). This tool consists of a Java library and an Eclipse plug-in. The Java library is the back-end, which provides command-line interfaces and Java APIs to re-train the text mining model using users' data and automatically detect SATD comments using either the build-in model or a user-specified model. The Eclipse plug-in, which is the front-end, first leverages our pre-trained composite classifier to detect SATD comments, and then highlights and marks these detected comments in the source code editor of Eclipse. In addition, the Eclipse plug-in provides a view in IDE which collects all detected comments for management.
Demo URL: https://youtu.be/sn4gU2qhGm0
Java library download: https://git.io/vNdnY
Eclipse plug-in download: https://goo.gl/ZzjBzp
Testing and debugging are time-consuming, tedious and costly. As many automated test generation tools are being applied in practice nowadays, there is a growing need for automated failure diagnosis. We introduce Aletheia, a failure diagnosis toolchain, which aims to help developers and testers reduce failure analysis time. The key ideas include: data generation to provide the relevant data for further analysis, failure clustering to group failing tests based on the hypothesized faults, and fault localization to pinpoint suspicious elements of the code. We evaluated Aletheia in a large-scale industrial case study as well as two open-source projects. Aletheia is released as an open-source tool on Github, and a demo video can be found at: https://youtu.be/BP9D68D02ZI
We present ElasTest, an open-source generic and extensible platform supporting end-to-end testing of large complex cloud systems, including web, mobile, network and WebRTC applications. ElasTest is developed following a fully transparent and open agile process around which a community of developers, contributors and users is collected. We demonstrate ElasTest in action by testing the FullTeaching application: the video is available from http://elastest.io/videos/icse2018-demo.
Random and search-based test generators yield realistic test cases based on program APIs, but often miss structural test objectives that depend on non-trivial data structure instances; Whereas symbolic execution can precisely characterise those dependencies but does not compute method sequences to instantiate them. We present SUSHI, a high-coverage test case generator for programs with complex structured inputs. SUSHI leverages symbolic execution to generate path conditions that precisely describe the relationship between program paths and input data structures, and converts the path conditions into the fitness functions of search-based test generation problems. A solution for the search problem is a legal method sequence that instantiates the structured inputs to exercise the program paths identified by the path condition. Our experiments indicate that SUSHI can distinctively complement current automatic test generation tools.
Mutation testing is widely used in research (even if not in practice). Mutation testing tools usually target only one programming language and rely on parsing a program to generate mutants, or operate not at the source level but on compiled bytecode. Unfortunately, developing a robust mutation testing tool for a new language in this paradigm is a difficult and time-consuming undertaking. Moreover, bytecode/intermediate language mutants are difficult for programmers to read and understand. This paper presents a simple tool, called universalmutator, based on regular-expression-defined transformations of source code. The primary drawback of such an approach is that our tool can generate invalid mutants that do not compile, and sometimes fails to generate mutants that a parser-based tool would have produced. Additionally, it is incompatible with some approaches to improving the efficiency of mutation testing. However, the regexp-based approach provides multiple compensating advantages. First, our tool is easy to adapt to new languages; e.g., we present here the first mutation tool for Apple's Swift programming language. Second, the method makes handling multi-language programs and systems simple, because the same tool can support every language. Finally, our approach makes it easy for users to add custom, project-specific mutations.
Creating models of software systems and analyzing the models helps develop more reliable systems. A well-known software modeling tool-set is embodied by the declarative language Alloy and its automatic SAT-based analyzer. Recent work introduced a novel approach to testing Alloy models to validate their correctness in the spirit of traditional software testing: A Unit defined the foundations of testing (unit tests, test execution, and model coverage) for Alloy, and MuAlloy defined mutation testing (mutation operators, mutant generation, and equivalent mutant checking) for Alloy. This tool paper describes our Java implementation of MuAlloy, which is a command-line tool that we released as an open-source project on GitHub. Our experimental results show that MuAlloy is efficient and practical. The demo video for MuAlloy can be found at https://youtu.be/3lvnQKiLcLE.
Mutation testing has shown great promise in assessing the effectiveness of test suites while exhibiting additional applications to test-case generation, selection, and prioritization. Traditional mutation testing typically utilizes a set of simple language specific source code transformations, called operators, to introduce faults. However, empirical studies have shown that for mutation testing to be most effective, these simple operators must be augmented with operators specific to the domain of the software under test. One challenging software domain for the application of mutation testing is that of mobile apps. While mobile devices and accompanying apps have become a mainstay of modern computing, the frameworks and patterns utilized in their development make testing and verification particularly difficult. As a step toward helping to measure and ensure the effectiveness of mobile testing practices, we introduce MD
Video URL: https://youtu.be/yzE5_-zN5GA
Software designers often lack an understanding of the effects of design decisions on quality properties of their software. This results in costly and time-consuming trial-and-error testing, delayed and complicated rollouts of the software. In this tool demonstration paper we present an integrated tool environment - the Palladio-Bench - for modeling and analyzing software architectures. The analysis results provided by Palladio support making design decisions by identifying the best-suited design from a set of given alternatives.
The demonstration video for the Palladio-Bench can be found at the URL https://youtu.be/vG7WQPcp-uI.
Performance problems observed in production environments that have their origin in program code are immensely hard to localize and prevent. Data that can help solve such problems is usually found in external dashboards and is thus not integrated into the software development process. We propose an approach that augments source code with runtime traces to tightly integrate runtime performance traces into developer workflows. Our goal is to create operational awareness of performance problems in developers' code and contextualize this information to tasks they are currently working on. We implemented this approach as an Eclipse IDE plugin for Java applications that is available as an open source project on GitHub. A video of PerformanceHat in action is online: https://youtu.be/fTBBiylRhag
The way software properties are defined, described, and measured, is different across different domains. When addressing these properties, several challenges commonly emerge, among which: synonymity, polysemy, paronymy, incomplete and inconsistent specification. In this paper we introduce PROMOpedia, an online encyclopedia, to tackle these challenges. PROMOpedia uses a web-content management system coupled with crowd-sourcing of scientific contents related to properties and their evaluation methods. The core concepts of PROMOpedia are built upon a property models ontology previously proposed by the authors, and is intended to target the needs of both researchers and practitioners. Website: http://www.mrtc.mdh.se/promopedia/
When changes in requirements occur, their associated tests must be adapted accordingly in order to maintain the quality of the evolving system. In practice, inconsistencies in requirements and acceptance tests---together with poor communication of changes---lead to software quality problems, unintended costs and project delays. We are developing GuideGen, a tool that helps requirements engineers, testers and other involved parties keep requirements and acceptance tests aligned. When requirements change, GuideGen analyzes the changes, automatically generates guidance on how to adapt the affected acceptance tests, and sends this information to subscribed parties. GuideGen also flags all non-aligned acceptance tests, thus keeping stakeholders aware of mismatches between requirements and acceptance tests. We evaluated GuideGen with data from three companies. For 262 non-trivial changes of requirements, the suggestions generated by GuideGen were correct in more than 80 percent of the cases for agile requirements and about 67 percent for traditional ones.
Demo video: https://vimeo.com/254865530
EVA is a tool for visualizing and exploring architectures of evolving, long-lived software systems. EVA enables its users to assess the impact of architectural design decisions and their systems' overall architectural stability. (Demo Video: https://youtu.be/Q3bnIQz13Eo)
Traditional commit-based sequential organization of software version histories is insufficient for many development tasks which require high-level, semantic understanding of program functionality, such as porting features or cutting new releases. Semantic history slicing is a technique which uses well-organized unit tests as identifiers for corresponding software functionalities and extracts a set of commits that correspond to a specific high-level functionality. In this paper, we present CS
Mutation-based fuzzing is a widely used software testing technique for bug and vulnerability detection, and the testing performance is greatly affected by the quality of initial seeds and the effectiveness of mutation strategy. In this paper, we present SAFL1, an efficient fuzzing testing tool augmented with qualified seed generation and efficient coverage-directed mutation. First, symbolic execution is used in a lightweight approach to generate qualified initial seeds. Valuable explore directions are learned from the seeds, thus the later fuzzing process can reach deep paths in program state space earlier and easier. Moreover, we implement a fair and fast coverage-directed mutation algorithm. It helps the fuzzing process to exercise rare and deep paths with higher probability. We implement SAFL based on KLEE and AFL and conduct thoroughly repeated evaluations on real-world program benchmarks against state-of-the-art versions of AFL. After 24 hours, compared to AFL and AFLFast, it discovers 214% and 133% more unique crashes, covers 109% and 63% more paths and achieves 279% and 180% more covered branches.
Video link: https://youtu.be/LkiFLNMBhVE
Smart contracts enabled a new way to perform cryptocurrency transactions over blockchains. While this emerging technique introduces free-of-conflicts and transparency, smart contract itself is vulnerable. As a special form of computer program, smart contract can hardly get rid of bugs. Even worse, an exploitable security bug can lead to catastrophic consequences, e.g., loss of cryptocurrency/money. In this demo paper, we focus on the most common type of security bugs in smart contracts, i.e., reentrancy bug, which caused the famous DAO attack with a loss of 60 million US dollars. We presented ReGuard, an fuzzing-based analyzer to automatically detect reentrancy bugs in Ethereum smart contracts. Specifically, ReGuard performs fuzz testing on smart contracts by iteratively generating random but diverse transactions. Based on the runtime traces, ReGuard further dynamically identifies reentrancy vulnerabilities. In the preliminary evaluation, we have analyzed 5 existing Ethereum contracts. ReGuard automatically flagged 7 previously unreported reentrancy bugs. A demo video of ReGuard is at https://youtu.be/XxJ3_-cmUiY.
Assertions are helpful in program analysis, such as software testing and verification. The most challenging part of automatically recommending assertions is to design the assertion patterns and to insert assertions in proper locations. In this paper, we develop Weak-Assert1, a weakness-oriented assertion recommendation toolkit for program analysis of C code. A weakness-oriented assertion is an assertion which can help to find potential program weaknesses. Weak-Assert uses well-designed patterns to match the abstract syntax trees of source code automatically. It collects significant messages from trees and inserts assertions into proper locations of programs. These assertions can be checked by using program analysis techniques. The experiments are set up on Juliet test suite and several actual projects in Github. Experimental results show that Weak-Assert helps to find 125 program weaknesses in 26 actual projects. These weaknesses are confirmed manually to be triggered by some test cases.
The address of the abstract demo video is: https://youtu.be/_RWC4GJvRWc
Systematic exploration of hypotheses is a major part of any empirical research. In software engineering, we often produce unique tools for experiments and evaluate them independently on different data sets. In this paper, we present KernelHaven as an experimentation workbench supporting a significant number of experiments in the domain of static product line analysis and verification. It addresses the need for extracting information from a variety of artifacts in this domain by means of an open plug-in infrastructure. Available plug-ins encapsulate existing tools, which can now be combined efficiently to yield new analyses. As an experimentation workbench, it provides configuration-based definitions of experiments, their documentation, and technical services, like parallelization and caching. Hence, researchers can abstract from technical details and focus on the algorithmic core of their research problem.
KernelHaven supports different types of analyses, like correctness checks, metrics, etc., in its specific domain. The concepts presented in this paper can also be transferred to support researchers of other software engineering domains. The infrastructure is available under Apache 2.0: https://github.com/KernelHaven. The plug-ins are available under their individual licenses.
Video: https://youtu.be/IbNc-H1NoZU
Object-oriented (OO) languages, by design, make heavy use of method invocations (MI). Unsurprisingly, a large fraction of OO-program bug patches also involve method invocations. However, current program repair techniques incorporate MIs in very limited ways, ostensibly to avoid searching the huge repair space that method invocations afford. To address this challenge, in previous work, we proposed a generate-and-validate repair technique which can effectively synthesize patches from a repair space rich in method invocation expressions, by using a machine-learned model to rank the space of concrete repairs. In this paper we describe the tool E
A software product line is a portfolio of software variants in an application domain. It relies on a platform integrating common and variable features of the variants using variability mechanisms---typically classified into annotative and compositional mechanisms. Annotative mechanisms (e.g., using the C preprocessor) are easy to apply, but annotations clutter source code and feature code is often scattered across the platform, which hinders program comprehension and increases maintenance effort. Compositional mechanisms (e.g., using feature modules) support program comprehension and maintainability by modularizing feature code, but are difficult to adopt. Most importantly, engineers need to choose one mechanism and then stick to it for the whole life cycle of the platform. The PEoPL (Projectional Editing of Product Lines) approach combines the advantages of both kinds of mechanisms. In this paper, we demonstrate the PEoPL IDE, which supports the approach by providing various kinds of editable views, each of which represents the same software product line using annotative or compositional variability mechanisms, or subsets of concrete variants. Software engineers can seamlessly switch these views, or use multiple views side-by-side, based on the current engineering task. A demo video of PEoPL is available at Youtube: https://youtu.be/wByUxSPLoSY
Model transformations (MTs) are key in model-driven engineering as they automate model manipulation. Their early verification is essential because a bug in a MT may affect many projects using it. Still, there is a lack of analysis tools applicable to non-toy transformations developed with practical MT languages.
To alleviate this problem, this paper presents A
The tool website is http://anatlyzer.github.io, and a video showcasing its features is at https://youtu.be/bFpbZht7bqY
Code developers in industry frequently use static analysis tools to detect and fix software defects in their code. But what about defects in the static analyses themselves? While debugging application code is a difficult, time-consuming task, debugging a static analysis is even harder. We have surveyed 115 static analysis writers to determine what makes static analysis difficult to debug, and to identify which debugging features would be desirable for static analysis. Based on this information, we have created V
We present SQLInspect, a tool intended to assist developers who deal with SQL code embedded in Java applications. It is integrated into Eclipse as a plug-in that is able to extract SQL queries from Java code through static string analysis. It parses the extracted queries and performs various analyses on them. As a result, one can readily explore the source code which accesses a given part of the database, or which is responsible for the construction of a given SQL query. SQL-related metrics and common coding mistakes are also used to spot inefficiently or defectively performing SQL statements and to identify poorly designed classes, like those that construct many queries via complex control-flow paths. SQLInspect is a novel tool that relies on recent query extraction approaches. It currently supports Java applications working with JDBC and SQL code written for MySQL or Apache Impala. Check out the live demo of SQLInspect at http://perso.unamur.be/~cnagy/sqlinspect.
Ideally, debuggers for Model-Driven Development (MDD) tools would allow users to 'stay at the model-level' and would not require them to refer to the generated source code or figure out how the code generator works. Existing approaches to model-level debugging do not satisfy this requirement and are unnecessarily complex and platform-specific due to their dependency on program debuggers. We introduced a novel approach to model-level debugging that formulates debugging services at model-level and implements them using model transformation. This approach is implemented in MDebugger, a platform-independent model-level debugger using Papyrus-RT, an MDD tool for the modeling language UML-RT.
https://youtu.be/L0JDn8eczwQ
Self-adaptation is nowadays recognized as an effective approach to deal with the uncertainty inherent to cyber-physical systems, which are composed of dynamic and deeply intertwined physical and software components interacting with each other. Engineering a self-adaptive cyber-physical system is challenging, as concerns about both the physical and the control system should be jointly considered. To this end, we present CyPhEF, a Model-Driven Engineering framework supporting the development and validation of self-adaptive cyber-physical systems.
Demo video: https://youtu.be/nmg-w2kfKEA.
Models in Model-Driven Engineering are heavily edited in all stages of software development and can become temporarily inconsistent. In general, there are many alternatives to fix an inconsistency, the actual choice is left to the discretion of the developer. Model repair tools should support developers by proposing a short list of repair alternatives. Such recommendations will be only accepted in practice if the generated proposals are plausible and understandable. Current approaches, which mostly focus on fully automatic, non-interactive model repairs, fail in meeting these requirements. This paper proposes a new approach to generate repair proposals for inconsistencies that were introduced by incomplete editing processes which can be located in the version history of a model. Such an incomplete editing process is extended to a full execution of a consistency-preserving edit operation. We demonstrate our repair tool R
The paper presents COMB, a tool to improve accuracy and efficiency of software engineering tasks that hinge on computing all relevant program behaviors. Computing all behaviors and selecting the relevant ones is computationally intractable. COMB uses Projected Control Graph (PCG) abstraction to derive the relevant behaviors directly and efficiently. The PCG is important as the number of behaviors relevant to a task is often significantly smaller than the totality of behaviors.
COMB provides extensive capabilities for program comprehension, analysis, and verification. We present a basic case study and a Linux verification study to demonstrate various capabilities of COMB and the addressed challenges. COMB is designed to support multiple programming languages. We demonstrate it for C and Java. Video url: https://youtu.be/YoOJ7avBIdk
The Gamma Statechart Composition Framework is an integrated tool to support the design, verification and validation as well as code generation for component-based reactive systems. The behavior of each component is captured by a statechart, while assembling the system from components is driven by a domain-specific composition language. Gamma automatically synthesizes executable Java code extending the output of existing statechart-based code generators with composition related parts, and it supports formal verification by mapping composite statecharts to a back-end model checker. Execution traces obtained as witnesses during verification are back-annotated as test cases to replay an error trace or to validate external code generators.
Tool demonstration video: https://youtu.be/ng7lKd1wlDo
Large-scale software verification projects increasingly rely on proof assistants, such as Coq, to construct formal proofs of program correctness. However, such proofs must be checked after every change to a project to ensure expected program behavior. This process of regression proving can require substantial machine time, which is detrimental to productivity and trust in evolving projects. We present
A main goal of the fourth industrial revolution is changeability of production processes, which is the ability to react efficiently to unplanned production changes. Existing automation system architectures limit this changeability. PLC programs used for automation include low-level behavior of actuators, strategies, management functions without information hiding. This yields unmaintainable, and therefore hard to change systems. In this paper, we document our Virtual Automation Bus that enables changeable production.
The Product Owner (PO) is critical for translating business needs into a software implementation by gathering and prioritizing requirements, and assessing whether features have met the definition of "done." There is a paucity of detail about how POs achieve this daunting task in practice with potential negative consequences for project success.
In this research we employed a mixed-method approach comprising two case studies in which we interviewed and observed 55 practitioners across 9 large multi-national companies and an SME. Using a cross-case analysis we identified twelve distinct Product Owner activities.
From our empirical findings we created a Product Owner role taxonomy and found eight generic activities common to all teams, projects and companies regardless of project size.
The migration of legacy systems to a service-oriented architecture (SOA) allows to deal with the demand for interoperability and the need to provide a robust high-available service interface. However, such migration presents a considerable risk, as it often involves the use of different techniques on systems with elevated technical debt and high maintenance costs. For this purpose, a process is instantiated to provide an appropriate set of techniques that will minimize risks and at the same time ensure quality improvement of the systems throughout the migration process. In this sense, this work reports on a case study of the application of a process for the reengineering of legacy systems to support the implementation of SOA project. This study has been applied to the evolution of legacy systems of the Secretariat of State for Taxation of Rio Grande do Norte (SET/RN), Brazil, providing significant results regarding the achievement of important quality goals.
Software quality aspects and in particular software reliability is hard to measure. Current tools or techniques are not sufficient to support us to get insights in the software reliability risk in our customer's products. In this poster, we present a multi-faceted software reliability approach. Besides an industrial case study is also described with four assessment studies. The results of the case study have been well accepted by our customer and by the supplier of this customer. Moreover, our approach can be applied in the automotive domain as it is and even be improved by exchanging some of the process assessment components by the parts from automotive reference models, such as A-SPICE1.
Even experienced developers rigorously testing their code and using state of the art tools and practices, inject every now and then bugs into the code. There is a huge amount of literature about the characterization of such bugs including the effectiveness of the reports and the fixes, the time required to fix them, etc. Existing works have already identified several factors considered to influence directly the bug injection. However, there is no support for the claims made so far using data coming from industrial, bug-injecting development sessions. This paper aims at filling this gap by analyzing industrial bug-injecting development sessions from several points of view. It investigates 49 bug-injecting development sessions evaluating and discussing three alleged, developers-centered main causes of bug injection: expertise, knowledge of code, and distraction. Additionally, the paper provides insights into the complete lifetime of bugs from injection to the fix and discusses implications for bug prediction.
In the automotive domain standards like ISO 26262 require a structured test process. Test cases are usually derived from requirements and documented in test case specifications. They provide a necessary basis for test implementation and execution. Therefore, test case specifications are a fundamental part of the automotive test process. The aim of this work is to gain insights into the creation and processing of test case specifications from a practitioner's point of view. In order to identify challenges concerning automotive test case specifications, we conducted an explorative case study based on 17 semi-structured interviews at a German OEM and three automotive suppliers. The interviews were transcribed and analyzed qualitatively to identify the challenges. We summarized the challenge to a taxonomy consisting of nine main categories: (1) availability and (2) content-related problems with input artifacts, (3) knowledge-related problems, (4) test case related problems, (5) the test case specification content-related problems, (6) process-related problems, (7) communication-related problems, (8) quality-related problems, and (9) tool-related problems. In general, we noticed that the interviewees were aware of challenges in terms of test case specifications in the automotive domain. Nevertheless, some of the current solutions are not efficient and require a lot of manual work.
Work item tracking systems such as Visual Studio Team Services, JIRA, and GitHub issue tracker are widely used by software engineers. They help in managing different kinds of deliverables (e.g.Âăfeatures, user stories, bugs), plan sprints, distribute tasks across the team and prioritize the work. While these tools provide reporting capabilities there has been little research into the role these reports play in the overall software development process.
In this study, we conduct an empirical investigation on the usage of Analytics Service - a reporting service provided by Visual Studio Team Services (VSTS) to build dashboards and reports out of their work item tracking data. In particular, we want to understand why and how users interact with Analytics Service and what are the outcomes and business decisions taken by stakeholders from reports built using Analytics Service. We perform semi-structured interviews and survey with users of Analytics Service to understand usage and challenges. Our report on qualitative and quantitative analysis can help organizations and engineers building similar tools or services.
While barcamps have been adopted as a learning format for IT professionals for some years, only a few examples for their adaptation as a setting in the higher software engineering education domain have been published so far. Therefore, in this paper a teaching experiment of undergraduate students attending a developer barcamp is described and evaluated. While its results are promising in general, the impact of the intrinsic motivation and previous skills of the participants appears to be crucial for the success of the format among students, in particular for non-computer science majors.
The Statistical Process Control (SPC) importance for the software industry has grown in recent years, mainly due to the use of quality models. In this context, this work aims to propose a teaching methodology for SPC where the learning process is student centered. The methodology is composed of reading experience reports, PBL, practical cases discussion, use of games, practical projects, and reflections on the learned.
1 The careful selection of a project for Software Engineering courses is important from the point of view of the student, the teacher and the project user or client -if available. But, how do you determine whether to develop a complex fictional case, or a simple real-world project with a considerable amount of learning about the software lifecycle? How do you determine whether or not to develop a social project aimed at the community?
In order to analyze project characteristics a framework was elaborated, based in a matrix with attributes found in different academic projects carried out over 10 years, and weighted according to certain criteria.
It is expected that the result of this analysis will help professors choose the most appropriate software project to develop in class, depending on the characteristics of each project.
Rich research has shown that both the teaching and learning of high-quality programming are challenging and deficient in most colleges' education systems. Recently the continuous inspection paradigm has been widely used by developers on social coding sites (e.g., GitHub) as an important method to ensure the internal quality of massive code contributions. In this study we designed a specific continuous inspection process for students' collaborative projects and conducted a controlled experiment with 48 students from the same course during two school years to evaluate how the process affects their programming quality. Our results show that continuous inspection can significantly reduce the density of code quality issues introduced in the code.
This study aims to characterize the state-of-the-art of the software startup education by analyzing and identifying best practices, opportunities and gaps on this field. To do so, we conducted a systematic mapping study in order to analyze and evaluate studies on software startup education. As a result, we found 31 primary studies in this process. These studies were classified into for categories: real projects, multidiscipline, environment and teaching. We concluded that research on software startup education is still scarce. Furthermore, there are several gaps and opportunities to be explore in future works. One of them is the difficulty in providing a real world experience in a educational setting. Successful cases reported combine three major components: real world projects, the right environment and a multidisciplinary context.
Software Engineering education requires offering to students practical experience via collaboration with the industry and working in teams. At the same time, students require different skills and knowledge at different levels of their studies, i.e. undergraduate versus postgraduate. In this context, Transactive Memory, referring to the shared store of knowledge, affects the dynamics in groups influencing the teaching outcome. In this paper, we present the process that we have employed in University of Cyprus, for teaching Software Engineering courses to bachelor and master students. We describe the process of team building, the different roles, and how the group dynamics can affect Transactive Memory.
Today's courses in engineering and other fields frequently involve projects done by teams of students. An important aspect of these team assignments is the formation of the teams. In some courses, teams select different topics to work on. Ideally team formation would be included with topic selection, so teams could be formed from students interested in the same topics. Intuitive criteria for a team formation algorithm are that students should be assigned to (1) a topic which they have interest and (2) a team of students with similar interests in their topic. We propose an approach to meeting these criteria by mining student preferences for topics with a clustering approach and then matching them in groups to topics that suit their shared interests. Our implementation is based on hierarchical k-means clustering and a weighting formula that favors increasing overall student satisfaction and adding members until the maximum allowable team size is reached.
Engagement has been shown to contribute to students' success. We used an NSSE-like survey and interviews to examine engagement of students registered in software engineering and information system engineering at Ben Gurion University of the Negev (BGU). The survey showed BGU students had generally lower engagement in comparison to US students except for in collaborative learning. BGU students lean towards perceiving their studies as a means for professional success rather than for traditional academic success. We attribute the differences between the students and their US counterparts to differences in culture and the age of digital media that allows for multiple ways of learning beyond the university.
This article introduces a multidisciplinary skill assessment for learning embedded software development. In the industry, software engineers and mechanical engineers have to communicate with each other, and hence, these engineers need to have multidisciplinary skills. For learning such skills, we need to give an occasion to work with different field students. To this end, we have been organizing a robot contest with embedded software development education. One of the goals of the contest is to inculcate multidisciplinary skills. However, we have not clarified the contribution. Thus, we construct a multidisciplinary skill assessment map based on the experiences gained through these contests. The map consists of (1) integration skill, (2) performing skill, and (3) cross-understanding skill.
The PBL (Problem-Based Learning) methodology provides many benefits to those who use it in teaching. In this light, it is important to plan well when using this methodology, efficient to the purposes established by an educator, in a way to avoid those vital aspects to educational planning in the PBL approach that are neglected or forgotten. However, there is a lack of specific tools to help educators in the task of planning their teaching, specifically geared to the PBL approach. As an alternative to this problem, this paper proposes a tool consisting of a Canvas PBL and a set of cards intended to guide the planning of teaching in the PBL approach.
Mobile application development (MAD) has became, or is considering to be a part of the academic curricula in Computer Science courses. However, training students on mobile application development inherits the challenges of teaching software engineering where the target computer is a device that has a large number of features accessible by software. Furthermore, the most related experience in teaching students reveals difficulties in developing software engineering competencies. In this paper we present results from a case study conducted in four universities in Brazil. We have investigated the adoption of Challenge-Based Learning (CBL) framework and agile practices for training students in software engineering applied in mobile application development environments.
Training computer scientists to address wicked problems means to focus respectively on the individual capability to think in a computational-oriented way (i.e., Computational Thinking), and on the social dimension of coding (i.e., Agile Values). In this study we propose the conceptual model of Cooperative Thinking, a new education construct of team-based computational problem solving. Cooperative Thinking is not only the sum of Computational Thinking and Agile Values, rather it is a new overarching competence suitable to deal with complex software engineering problems. We suggest to tackle the Cooperative Thinking construct as an education goal, to train new generations of software developers to Pareto-optimize both their individual and teaming performances.
Self empowered learning is a great challenge for professionals in software engineering. Due to everyday work, there is often no time or need for complete courses or textbooks on a broad topic. This challenge leads to multiple approaches like learning objects, micro learning and learning nuggets. Critical part of these approaches is to provide metadata to each resource in order to place the resource in context, enabling the learner to easily retrieve resources and the lecturer to reuse the material. Metadata definitions like LOM or IMS-LD are however defined on a generic level, suitable for education in general, without considering specific domain characteristics. Additionally, much of the available learning material focuses on acquisition of knowledge, rather than the empowerment to perform new tasks in the learners current situation. In this paper we present a preliminary model to describe the software engineering specific context of learning material that can be used to extend existing metadata definitions in learning management systems.
Team exercises for software development project-based learning (SDPBL) adopting an agile development model have become popular for training and education worldwide. In the agile development model, an essential part is the build process. In this study, we investigated students' build errors in agile SDPBL projects by monitoring and collecting logs of the build process from 2013 to 2016. From 2013 to 2015, we categorized the build errors and then discussed the resolutions for each types of build errors. In 2016, the instructors modified the SDPBL project the build error types and corresponding cause and resolution. As the result, in 2016, the number of build errors and the time required to solve the build errors decreased compared to previous years.
An important problem in runtime verification is monitorability. If a property is not monitorable, then it is meaningless to check it at runtime, as no satisfaction or violation will be reported in finite steps. In this paper, we revisit the classic definition of monitorability, and show that it is too restrictive for practical runtime verification. We propose a weaker but more practical definition of monitorability, say weak monitorability, and show how to decide weak monitorability for runtime verification.
Bug reporting is a major part of software maintenance and due to its inherently asynchronous nature, duplicate bug reporting has become fairly common. Detecting duplicate bug reports is an important task in order to avoid the assignment of a same bug to different developers. Earlier approaches have improved duplicate bug report detection by using the notions of word embeddings, topic models and other machine learning approaches. In this poster, we attempt to combine Latent Dirichlet Allocation (LDA) and word embeddings to leverage the strengths of both approaches for this task. As a first step towards this idea, we present initial analysis and an approach which is able to outperform both word embeddings and LDA for this task. We validate our hypothesis on a real world dataset of Firefox project and show that there is potential in combining both LDA and word embeddings for duplicate bug report detection.
Software libraries can be used in different ways, but not all of their APIs are well documented in the documentation and programming guides. To recommend an API usage from a query, recently, going beyond searching for existing code, researchers have aimed to generate new API code, by exploring statistical approaches including phrase-based statistical machine translation (SMT) [5], probabilistic CFG [4], AST-based translation [2], and deep neural network [3]. A key limitation of existing approaches is the strict order of translation from left to right, leading to low accuracy in the resulting APIs.
The maintainability and understandability of a software system are affected by the way the system's concerns are implemented, especially if they are crosscutting concerns. In this paper we present a study of how monitoring crosscutting concerns are implemented in ten object-oriented software systems. The study's results are going to be used towards a new approach for automatic identification of monitoring concerns implementation.
Software documentation is a significant component of modern software systems. Each year it becomes more and more complicated, just as the software itself. One of the aspects that negatively impact documentation quality is the presence of textual duplicates. Textual duplicates encountered in software documentation are inherently imprecise, i.e. in a single document the same information may be presented many times with different levels of detail and in various contexts. Documentation maintenance is an acute problem, and there is a strong demand for automation tools in this domain.
In this study we present the Duplicate Finder Toolkit, a tool which assists an expert with duplicate maintenance-related tasks. Our tool can facilitate the maintenance process in a number of ways: 1) detection of both exact and near duplicates 2) duplicate visualization via heat maps 3) duplicate analysis - comparison of several duplicate instances, evaluation of their differences, exploration of duplicate context 4) duplicate manipulation and extraction.
The performance of recommender systems is commonly characterized by metrics such as precision and recall. However, these metrics can only provide a coarse characterization of the system, as they offer limited intuition and insights on potential system anomalies, and may fail to provide a developer with an understanding of the strengths and weaknesses of a recommendation algorithm. In this work, we start to describe a model of recommender systems that defines a space of properties. We begin exploring this space by defining templates that relate to the properties of coverage and diversity, and we demonstrate how instantiated characteristics offer complementary insights to precision and recall.
A pair of inverse operations is denned as two operations that when performed on a number or variable always results in the original number or variable. Novice programmers may introduce such inverse operations; automated parallelizing tools also employ such operations to undo the effects of some speculatively executed operation. Therefore, detection of inverse operations is helpful in both compiler optimization (redundant code elimination) as well as verification of parallelizing frameworks. In this work, we extend the definition of inverse operations to include a set of operations instead of only two and present a method for detecting inverse operations symbolically which would otherwise need complete unrolling of loops. Some interesting intricacies of detecting inverse operations are also discussed.
Debugging of distributed computing model programs like MapReduce is a difficult task. That's why prior studies only focus on finding and fixing bugs in early stages of program development. Delta debugging tries to find minimal failing input in sequential programs by dividing inputs into subsets and testing these subsets one-by-one. But no prior work tries to find minimal failing input in distributed programs like MapReduce. In this paper, we present MapRedDD, a framework to efficiently find minimal failing input in MapReduce programs. MapRedDD employs failing input selection technique, focused on identifying the failing input subset in the single run of MapReduce program with multiple input subsets instead of testing each subset separately. This helps to reduce the number of executions of MapReduce program for each input subset and overcome the overhead of job submission, job scheduling and final outcome retrieval. Our work can efficiently find the minimal failing input in the number of executions equal to the number of inputs to MapReduce program N as opposed to the number of executions of MapReduce program equal to the number of input subsets 2N - 1 in worst case for binary search invariant algorithm to find minimal failing input.
Fork-based development allows developers to start development from existing software repository by copying the code files. However, when the number of forks grows, contributions are not always visible to others, unless an explicit merge-back attempt is made. To solve this problem, we implemented Forks Insight (www.forks-insight.com) to help developers get an overview of forks on GitHub. The current release version focuses on simple analytics for the high level overview which is lightweight, scalable and practical. It has a user-friendly interactive web interface with features like searching and tagging.
Issue tracking systems (ITS) are widely used to describe and manage activities in software systems. Within the ITS, software developers create issues and establish typed links among these artifacts. A variety of different link types exists, e. g. that one issue clones another. The rational about choosing a specific link type is not always obvious. In this paper, we study link type selection and focus on the relationship of textual properties of connected issues to picked link type. We performed a study on seven open-source systems and quantified the usage of typed links. Further, we report preliminary results indicating, that depending on link type, a link mostly captures textual similarity of issues and thus may provide only limited additional information.
We leverage Latent Dirichlet Allocation to analyze R source code from 10,051 R packages in order to better understand the topic space of scientific computing. Our method is able to identify several generic programming concepts and, more importantly, identify concepts that are highly specific to scientific and high performance computing applications.
Recently, public interest in the blockchain technology has surged and various applications based on the technology have emerged. However, there has been little study on architectural evaluations of popular block chain platforms that can help the developers choose an appropriate architecture matching their needs. In this paper, we reconstruct and evaluate the architecture of Hyperledger and Ethereum, which are representative open source platforms for blockchain. The evaluation results indicate that Hyperledger is strong in modifiability and performance whereas Ethereum is better in security.
Over the past 20 years, agile development methodology has been commonly adopted by developers. One reason is that it flexibly responds to changes. In contrast, a typical requirement traceability matrix is inflexible in incorporating changes. In this paper, we propose Agile Requirement Traceability Matrix (ARTM). ARTM manages traceability mappings between two artifacts and automatically generates the entire requirement traceability matrix in spreadsheet. The case study, we also conducted, shows that ARTM flexibly incorporates changes in the spirit of agile development.
The unsafe features of C often lead to memory errors that can result in vulnerabilities. Dynamic analysis tools are widely used to detect such errors at runtime and enforce memory safety. It is widely believed that memory safety exactly consists of spatial and temporal safety thus all existing analysis tools aim at detecting spatial or temporal errors. In this paper, we introduce another class of memory safety, namely segment safety, which has been neglected in previous work. Indeed, state-of-the-art analysis tools cannot detect segment errors. Thus we propose and implement a new approach to detect segment errors at runtime.
This paper proposes a novel virtual shared memory framework, Soft Memory Box (SMB), which directly shares the memory of remote nodes among distributed processes to improve communication performance/speed via deep learning parameter sharing.
Bug report filing is a major part of software maintenance. Due to extensive number of bugs filed everyday in large software projects and the asynchronous nature of bug report filing ecosystem, duplicate bug reports are filed. Capturing and tagging duplicate bug reports is crucial in order to avoid assignment of the same bug to different developers. Efforts have been made in the past to detect duplicate bug reports by using topic modelling [2], discriminative methods [5], meta-attributes [6], etc. Recently, Yang et al. [8] proposed an approach to combine word embeddings, TF-IDF and meta-attributes to compute bug similarity between two bug reports.
Per-Input Control Flow Integrity (PICFI) represents a recent advance in dynamic CFI techniques. PICFI starts with the empty CFG of a program and lazily adds edges to the CFG during execution according to concrete inputs. However, this CFG grows monotonically, i.e., invalid edges are never removed when corresponding control flow transfers (via indirect calls) become illegal (i.e., will never be executed again). This paper presents LPCFI, Live Path Control Flow Integrity, to more precisely enforce forward edge CFI using a dynamically computed CFG by both adding and removing edges for all indirect control flow transfers from function pointer calls, thereby raising the bar against control flow hijacking attacks.
We are in the pursuit of establishing a method for continuous data driven software architecture. We describe the problem with current methods for measuring the impact of refactoring long lived systems at the architectural level and architecture compliance checking. We summarize our studies of code churn, productivity and an automatic tool for compliance checking. We conclude that architecture violations seem to play an important role, but current methods are infeasible for industry practice. Finally we propose to use repository data mining to improve current methods for architecture compliance checking.
A fundamental aspect in the requirements engineering process is to know the quality of a specification, including how the quality evolves over time. This paper introduces an industrial approach for analysis of requirements quality evolution. The approach has been implemented in the System Quality Analyzer tool, exploits quality metrics for requirements correctness, consistency, and completeness, and is based on the storage of quality information in snapshots that are combined and displayed in charts. This can help practitioners to assess the progress and status of a requirements engineering process and to make decisions.
Formal behavioral specifications help ensure the correctness of programs. Writing such specifications by hand however is time-consuming and requires substantial expertise. Previous studies have shown how to use a notion of consensus to automatically infer pre-conditions for APIs by using a large set of projects. In this work, we propose a similar idea of consensus to automatically infer post-conditions for popular APIs. We propose two new algorithms for mining potential post-conditions from API client code. The first algorithm looks for guarded post-conditions that test the value returned from the API and throws an exception. The second algorithm looks for values flowing from the API to another API with already known preconditions, which recommends them as post-conditions of the first API.
Empirical studies in software testing require realistic benchmarks which are able to mimic industry-like environments. For evaluating automated failure diagnosis techniques, one needs real reproducible bugs with at least one associated failing test. Extracting such bugs is challenging and time-consuming. This paper presents Pairika, a failure diagnosis benchmark for C++ programs. Pairika contains 40 bugs extracted from 7 modules of OpenCV project with more than 490 KLoC and 11129 tests. Each bug is accompanied by at least one failing test. We publish Pairika to facilitate and stimulate further research on automated failure diagnosis techniques. Pairika is available at: https://github.com/tum-i22/Pairika
The Java 8 Stream API sets forth a promising new programming model that incorporates functional-like, MapReduce-style features into a mainstream programming language. However, using streams correctly and efficiently may involve subtle considerations. In this poster, we present our ongoing work and preliminary results towards an automated refactoring approach that assists developers in writing optimal stream code. The approach, based on ordering and typestate analysis, determines when it is safe and advantageous to convert streams to parallel and optimize parallel streams.
Many web applications and software engineering tools such as test generators are not accessible for users who do not use traditional input devices such as mouse and keyboard. To address this shortcoming of current applications, this work leverages recent speech recognition advances to create a browser plugin that interprets voice inputs as web browser commands and as steps in a corresponding test case. In an initial experiment, the resulting Voice Controlled Accessibility and Testing tool (VCAT) prototype for Chrome and Selenium yielded a lower overall runtime than a traditional test creation approach.
Adaptive Bug Search (ABS) is a service developed by Oracle that uses machine learning to find potential duplicate bugs for a given input bug. ABS leverages the product and component relationships of existing duplicate bug pairs to limit the set of candidate bugs in which it searches for potential duplicates. In this paper, we discuss various approaches for selecting and refining the set of candidate bugs.
Smart home systems rely on cloud servers and multiple IOT (Internet Of Things) devices such as smart thermometers, video monitors, and smart appliances to realize convenient remote home control and monitoring. As smart home systems get more and more popular recently, the security protection of smart home systems has become an important problem. Among the components of a smart home system, the remote controlling mobile app is often the most vulnerable part as it is directly exposed to the public network. In this paper, we propose a novel tool, HomeGuard, to detect potential vulnerabilities in remote controlling apps of smart home system. Specifically, HomeGuard first identifies information flows related to sensitive control and data messages, and then checks where such information flows to and whether they are properly encrypted.
Compiler optimizations influence the effectiveness and efficiency of symbolic execution. In this extended abstract, we report our recent results of recommending compiler optimizations for symbolic execution w.r.t. MC/DC coverage. We carried out extensive experiments to study the influence of compiler optimizations on MC/DC coverage. Then, an SVM-based optimization recommendation method is designed and implementation. The preliminary experimental results are promising.
Multi-dimensional goals can be formalized in so-called quality models. Often, each dimension is assessed with a set of metrics that are not comparable; they come with different units, scale types, and distributions of values. Aggregating the metrics to a single quality score in an ad-hoc manner cannot be expected to provide a reliable basis for decision making. Therefore, aggregation needs to be mathematically well-defined and interpretable. We present such a way of defining quality models based on joint probabilities. We exemplify our approach using a quality model with 30 standard metrics assessing technical documentation quality and study ca. 20,000 real-world files. We study the effect of several tests on the independence and results show that metrics are, in general, not independent. Finally, we exemplify our suggested definition of quality models in this domain.
Automated program repair is a very active research field, with promising results so far. Several program repair techniques follow a Generate-and-Validate work-scheme: programs are iteratively sampled from within a predefined repair search space, and then checked for correctness to see if they constitute a repair.
In this poster, we propose an enhanced work-scheme, called Generate-Validate-AnalyzeErr, in which whenever a program is found to be incorrect, the error trace that is the evidence of the bug is further analyzed to obtain a search hint. This hint improves the sampling process of programs in the future. The effectiveness of this work-scheme is illustrated in a novel technique for program repair, where search hints are generated in a process we call error generalization. The goal of error generalization is to remove from the search space all programs that exhibit the same erroneous behavior.
The aim of this poster is to present our vision of the future of program repair, and trigger research in directions that have not been explored so far. We believe that many existing techniques can benefit from our new work-scheme, by focusing attention on what can be learned from failed repair attempts. We hope this poster inspires others and gives rise to further work on this subject.
The productivity of a (team of) developer(s) can be expressed as a ratio between effort and delivered functionality. Several different estimation models have been proposed. These are based on statistical analysis of real development projects; their accuracy depends on the number and the precision of data points. We propose a data-driven method to automate the generation of precise data points. Functionality is proportional to the code size and Lines of Code (LoC) is a fundamental metric of code size. However, code size and LoC are not well defined as they could include or exclude lines that do not affect the delivered functionality. We present a new approach to measure the density of code in software repositories. We demonstrate how the accuracy of development time spent in relation to delivered code can be improved when basing it on net-instead of the gross-size measurements. We validated our tool by studying ca. 1,650 open-source software projects.
Behavior Change Software systems (BCSSs) have shown promising outcomes in terms of promoting healthy behaviors. However, a negative User Experience (UX) can be induced by BCSS if designers do not have clear understanding of the requirements that factually help in changing the user behavior that achieves a sustainability goal.
In order to get insights into how to discover such sustainability requirements, we propose a discovery approach, whose emphasis is placed on negative UX assessed through attitudes and behaviors expressed by users due to the lack of fulfillment of actual user needs. The approach is tested on existing software systems designed for preventing or reducing Repetitive Strain Injury as a particular category of BCSS. Twelve requirements that contribute to social sustainability were discovered.
Newcomers and volunteers contributions play an effective role the open source software (OSS) success. This role is confirmed through a rigor set of studies in software engineering discipline. As Open source projects are developed based on social and technical efforts, then it is very important for newcomers to empower their socio-technical skills. This paper focuses on newcomers' success in open source community by analyzing newcomers' reputation on their initial activities in a social coding environment such as GitHub. By applying mining software repositories (MSR) techniques on GitHub data we found the main projects' attributes where successful newcomers contributed to them. These attributes can help other newcomers to select the right project for their initial activities.
Small Unmanned Aircraft Systems (sUAS) are an emerging application area for many industries including surveillance, agriculture monitoring, and vector-borne disease control. With drastically lower costs and increasing performance and autonomy, future application evolution will more than likely include the use of sUAS swarms. Several largely successful experiments in recent years, using off the shelf sUAS, have been conducted to address the long standing challenge of controlling and monitoring vector-borne diseases. In this paper we build on lessons learned from these prior efforts, and discuss ways in which swarms of sUAS could be deployed to place and monitor Autocidal Gravid Ovitraps for reducing the mosquito population.
Human interaction and behavior are at the core of most software engineering (SE) activities. Furthermore, software is created to fulfill human stakeholders' needs and wishes, and the resulting software is ultimately utilized (directly or indirectly) by human users. Today's software is highly intertwined with our lives, and it possesses an increasing ability to act and influence us. Besides the obvious example of self-driving cars and their potential harmfulness, more mundane software such as social networks may introduce bias, break privacy preferences, lead to digital addiction, etc. Additionally, the SE process itself is highly affected by ethical issues, such as diversity and business ethics. This paper introduces ethics-aware SE, a version of SE in which the ethical values of the stakeholders (including developers and users) are captured, analyzed, and reflected in software specifications and in the SE processes.
OSEK/VDX standard [3] has now been widely adopted by many automotive manufacturers and research groups to develop a vehicle-mounted system. An OSEK/VDX vehicle-mounted system generally runs on several processors (e.g., the system shown in Figure 1 runs on two processor), and it consists of three components: OS, multi-tasking application and communication protocol. The OS locating at a processor manages an application and conducts tasks within the application to execute on a processor, especially a deterministic scheduler (static priority scheduling policy) is adopted by the OSEK/VDX OS to dispatch tasks. The applications are in charge of realizing functions and often interact with each other via the communication protocol such as controller area network (CAN). There are two complex execution characteristics in the OSEK/VDX vehicle-mounted systems: (i) tasks within an application concurrently execute on a processor under the scheduling of OSEK/VDX OS; (ii) applications simultaneously run on the different processors and communicate each other sometimes. Due to the concurrency of tasks and simultaneity between applications, how to exhaustively verify a developed OSEK/VDX distributed application system in which applications cooperatively complete a function based on the communication protocol has become a challenge for developers with the increasing development complexity.
The growth in the number of non-developer open source software (OSS) application users and the escalating use of these applications have both created a need for and interest in developing usable OSS. OSS communities are unclear about which techniques to use in each activity of the development process. The aim of our research is to adopt the visual brainstorming usability technique in the HistoryCal OSS project and determine the feasibility of adapting the technique for application. To do this, we participated as volunteers in the HistoryCal project. We used the case study research method to investigate technique application and community participation. We identified adverse conditions that were an obstacle to technique application and modified the technique to make it applicable. We can conclude from our experience that these changes were helpful for applying the technique, although it was not easy to recruit OSS users to participate in usability technique application.
This paper presents a general framework to detect behavioral design patterns by combining source code and execution data. The framework has been instantiated for the observer, state and strategy patterns to demonstrate its applicability. By experimental evaluation, we show that our combined approach can guarantee a higher precision and recall than purely static approaches. In addition, our approach can discover all missing roles and return complete pattern instances that cannot be supported by existing approaches.
Though minimal mutations were widely studied in recent years, the operators that produce these mutants remain largely unknown. This poster develops a coverage-based approach to identify them by defining Subsuming Mutation Operators (SMOs), the operators that cover all subsuming ones with minimal number of mutants. We then performed an empirical study on 61,000 mutants of 14 programs to determine SMOs among Proteum operators and investigate their properties. The results show only 17 of 82 operators are SMOs and they are not equally significant. For example, OLRN produces only 0.5% of subsuming mutants while VDTR accounts for almost 20%. A set of operators that are efficient at producing subsuming mutants are identified in our study and presented in this paper.
Missing check for untrusted input used in security-sensitive operations is one of the major causes of various serious vulnerabilities. Thus, efficiently detecting missing checks for realistic software is essential for identify insufficient attack protections. We propose a systematic static approach to detect missing checks in C/C++ programs. An automated and cross-platform tool named Vanguard was implemented on top of Clang/LLVM 3.6.0. And experimental results have shown its effectiveness and efficiency.
Code smells reflect sub-optimal patterns of code that often lead to critical software flaws or failure. In the same way, community smells reflect sub-optimal organisational and socio-technical patterns in the organisational structure of the software community.
To understand the relation between the community smells and code smells we start by surveying 162 developers of nine open-source systems. Then we look deeper into this connection by conducting an empirical study of 117 releases from these systems.
Our results indicate that community-related factors are intuitively perceived by most developers as causes of the persistence of code smells. Inspired by this observation we design a community-aware prediction model for code smells and show that it outperforms a model that does not consider community factors.
Communication is essential in software engineering. Especially in distributed open-source teams, communication needs to be supported by channels including mailing lists, forums, issue trackers, and chat systems. Yet, we do not have a clear understanding of which communication channels stakeholders in open-source projects use. In this study, we fill the knowledge gap by investigating a statistically representative sample of 400 GitHub projects. We discover the used communication channels by regular expressions on project data. We show that (1) half of the GitHub projects use observable communication channels; (2) GitHub Issues, e-mail addresses, and the modern chat system Gitter are the most common channels; (3) mailing lists are only in place five and have a lower market share than all modern chat systems combined.
Assigning an issue to the correct component(s) is challenging, especially for large-scale projects which have are up to hundreds of components. We propose a prediction model which learns from historical issues reports and recommends the most relevant components for new issues. Our model uses the deep learning Long Short-Term Memory to automatically learns semantic features representing an issue report, and combines them with the traditional textual similarity features. An extensive evaluation on 142,025 issues from 11 large projects shows our approach outperforms alternative techniques with an average 60% improvement in predictive performance.
Combinatorial Interaction Testing (CIT) aims at constructing an effective test suite, such as a Covering Array (CA), that can detect faults that are caused by the interaction of parameters. In this paper, we report on some empirical studies conducted to examine the fault detection capabilities of five popular CA constructors: ACTS, Jenny, PICT, CASA, and TCA. The experimental results indicate that Jenny has the best performance, because it achieves better fault detection than the other four constructors in many cases. Our results also indicate that CAs generated using ACTS, PICT, or CASA should be prioritized before testing.
Information retrieval-based bug localization techniques are evaluated using datasets with an oracle. However, datasets can contain non-buggy files, which affect the reliability of these techniques. To investigate the impact of non-buggy files, we show that a test file can be regarded as a buggy file. Then, we determined if this file caused inaccuracies that would eventually affect the trustworthiness of previous techniques. We further analyzed the impact of test files on IR-based bug localization through three research questions. Our results show that the test files significantly impact the performance of the techniques. Furthermore, MAP increased by a maximum of 21%, and MRR decreased by a maximum of 13%.
Model checking is an automatic approach in enhancing correctness of systems. However, when it is applied to discover flaws in software systems, most of the respective verification tools lack scalability due to the state-space explosion problem. Abstraction technique is useful in reducing the state space of systems. It maps a concrete set of states to a smaller set of states that is actually an approximation of the system with respect to the property of interest. Predicate abstraction [3] is one of the most often used methods for attaining a finite abstract model from a concrete program which is often even an infinite state system. With predicate abstraction, a finite set of predicates, which determines the precision of the abstraction, is selected to keep track of certain facts about the program variables. The model obtained via predicate abstraction is an over-approximation of the original program. Thus, spurious paths may exist when an insufficient set of predicates are considered.
Fault localization is one of the most important debugging tasks. Therefore, many techniques have already been developed to improve the efficiency. Among them, the spectrum-based fault localization technique is the most popular, and it has been the subject of 35% of total fault localization-related studies. SFL techniques leverage coverage spectra and localize a fault based on the coverage difference between passed and failed test cases. However, it is difficult to localize faults effectively when coverage differences are not clear. Therefore, we propose a novel variable-centric fault localization technique to improve performance of existing techniques. Proposed technique extracts suspicious variables and uses them to generate a suspicious ranked list. In an evaluation with 120 C faults, the proposed technique outperforms SFL techniques with the same similarity coefficient. The average Exam score of proposed techniques are reduced by 55% compared to SFL techniques.
Android has been one of the most popular platforms for smart phones, which has reached a 81.2% share in the mobile-phone market. With smart phones being ubiquitous, hackers are most likely to attack them to catch privacy of users. Android applications (also called Android apps) have been proved the effective target. Google Play store has provided billions of Android apps, but unfortunately, the advance has a dark side because security cannot be ensured by many Android apps. Hence, more and more attention has been paid to Android malware. Taint flow analysis has been proved an effective approach to providing potential malicious data flows. It aims at determining whether a sensitive data flows from a source to a sink. The analysis can be executed either dynamically or statically. Dynamic taint analysis [5] relies on testing to reach a appropriate code coverage criterion. It is able to precisely pinpoint leaks, but may be incomplete in exploring all possible executing paths. In contrast, static analysis takes all the possible paths for consideration. But most of the static analyses available for Android apps [1, 3] are inner-component based analysis which are unable to detect leaks across-components.
Microservices are quickly becoming an outstanding architectural choice in the service-oriented software industry. This approach proposes to develop each application as a collection of small services, each running on its process and inter-communicating with lightweight mechanisms. Currently, there is still no clear perspective of emerging recurrent solutions (architectural patterns) or design decisions (architectural tactics) in microservices both in industry and academia. This article describes a systematic review of the academic and industrial literature on architectural patterns and tactics proposed for microservices. The study reported: 44 architectural patterns of microservices in academia and 80 in the industry; architectural tactics related to microservices dependent on other disciplines; and it was also found that most of architectural patterns and tactics are associated to five quality attributes: scalability, flexibility, testability, performance, and elasticity. Added to that results, it was noticed that most microservices in the academic area are reported in evidence related to DevOps and IoT, but the industry is not interested in associating disciplines. Finally, a new proposal of microservices pattern taxonomy is suggested.
In recent years, tremendous efforts from both the industrial and the academic research communities have been put into bringing forth quantum computing technologies. With the potential proliferation of universal quantum computers on the horizon, quantum computing, however, is still severely grounded by numerous grave barriers, which lead to its low accessibility and practicality. For example, the vastly different underlying computing models, combined with the steep background knowledge requirements, makes it extremely difficult, if possible at all, for most software engineering researchers and practitioners to even begin to design or implement quantum algorithms or softwares in practice. To overcome this problem, we, in this paper, propose a design that largely circumvents said accessibility and practicality barriers, by providing an end-to-end quantum computing framework for solving NP-complete problems via reduction. We fully implemented a toolkit under our design framework. With this toolkit, software engineering researchers and practitioners would be able to enjoy the speedup and scalability benefits of universal quantum computers without necessarily having to have prior knowledge on quantum computing.
Code commenting is a common programming practice of practical importance to help developers review and comprehend source code. However, there is a lack of thorough specifications to help developers make their commenting decisions in current practice. To reduce the effort of making commenting decisions, we propose a novel method, CommentSuggester, to guide developers regarding appropriate commenting locations in the source code. We extract context information of source code and employ machine learning techniques to identify possible commenting locations in the source code. The encouraging experimental results demonstrated the feasibility and effectiveness of our commenting suggestion method.
Design decisions software architects make directly impact system quality. Real-world systems involve a large number of such decisions, and each decision is typically influenced by others and involves trade-offs in system properties. This paper poses the problem of making complex, interacting design decision relatively early in the project's lifecycle and outlines a search-based and simulation-based approach for helping architects make these decisions and understand their effects.
The unsafe features of C often lead to memory errors that can result in vulnerabilities. Many runtime verification tools are widely used to detect memory errors. However, existing tools lack DO-178C compliance, show limited performance, and demonstrate poor accessibility, e.g., lacking platform-independence. In this paper, we propose to implement dynamic analysis tools using source-to-source transformation, which operates on the original source code to insert code fragments written in ANSI C, and generates source files similar to the original files in structure. We show that source transformation can effectively avoid the mentioned drawbacks of existing tools, but it also faces many new challenges in implementation.
Recently, programming by examples (PBE) technique achieves a great success in processing and transforming data entities, yet existing approaches generally fall short on the tasks concerning entity relations. This paper presents ENTER, a domain-agnostic language for relation-aware entity transformation synthesis. It leverages the combination of two basic relations, the equivalence relation and the total order relation, to succinctly express complex entity relations. ENTER can be instantiated with domain-specific elements to solve a wide range of entity transformation tasks.
We present Java StarFinder (JSF), a tool for automated test case generation and error detection for Java programs having inputs in the form of complex heap-manipulating data structures. The core of JSF is a symbolic execution engine that uses separation logic with existential quantifiers and inductively-defined predicates to precisely represent the (unbounded) symbolic heap. The feasibility of a heap configuration is checked by a satisfiability solver for separation logic. At the end of each feasible path, a concrete model of the symbolic heap (returned by the solver) is used to generate a test case, e.g., a linked list or an AVL tree, that exercises that path.
We show the effectiveness of JSF by applying it on non-trivial heap-manipulating programs and evaluated it against JBSE, a state-of-the-art symbolic execution engine for heap-based programs. Experimental results show that our tool significantly reduces the number of invalid test inputs and improves the test coverage.
Automatically recommending API-related tutorial fragments or Q&A pairs from Stack Overflow (SO) is very helpful for developers, especially when they need to use unfamiliar APIs to complete programming tasks. However, in practice developers are more likely to express the API-related questions using natural language when they do not know the exact name of an unfamiliar API. In this paper, we propose an approach, called SOTU, to automatically find answers for API-related natural language questions (NLQs) from tutorials and SO. We first identify relevant API-related tutorial fragments and extract API-related Q&A pairs from SO. We then construct an API-Answer corpus by combining these two sources of information. For an API-related NLQ given by the developer, we parse it into several potential APIs and then retrieve potential answers from the API-Answer corpus. Finally, we return a list of potential results ranked by their relevancy. Experiments on API-Answer corpus demonstrate the effectiveness of SOTU.
Recently, many developers begin to notice that uncertainty is a crucial problem in software development. Unfortunately, no one knows how often uncertainty appears or what kinds of uncertainty exist in actual projects, because there are no empirical studies on uncertainty. To deal with this problem, we conduct a large-scale empirical study analyzing commit messages and revision histories of 1,444 OSS projects selected from the GitHub repositories.
The complexity of designing and verifying large-scale systems requires abstract models. Consistently and systematically deriving a more concrete model from an abstract model with regard to verification of its behavior against certain properties is an open problem. We propose a new workflow for systematic top-down design of models for a Cyber-physical System (CPS). It builds on a theory of systematic abstraction and refinement techniques in the context of verification through model checking. In addition, this workflow includes validation in the sense that a refined model is checked for its fit with reality. Our proposed workflow is new with respect to its systematic determination of model changes on different levels of abstraction based on the V&V results and the formal property over-approximation of an abstract model (as compared to the corresponding concrete model).
In this work, we reconstruct a set of Android app lineages which each of them represents a sequence of app versions that are historically released for the same app. Then, based on these lineages, we empirically investigate the evolution of app vulnerabilities, which are revealed by well-known vulnerability scanners, and subsequently summarise various interesting findings that constitute a tangible knowledge to the community.
A UML diagram analytic tool called UMLx is proposed, which automatically extracts information from UML diagrams to facilitate decision making in risk management, planning, resource allocation, and system design, based on a set of proposed metrics.
Agile testers distinguish between unit tests and component tests as a way to automate the bulk of the developer tests. Research on fault localisation largely ignores this distinction, evaluating the effectiveness of these techniques irrespective of whether the fault is exposed by unit tests---where the search space to locate the fault is constrained to the unit under test---or by component tests---where the search space expands to all objects involved in the test. Based on a comparison of sixteen spectrum based fault localisation techniques, we show that there is indeed a big difference in performance when facing unit tests and component tests. Consequently, researchers should distinguish between easy and difficult to locate faults when evaluating new fault localisation techniques.
Exception handling is an advanced programming technique to prevent run-time errors or crashes for modern software systems. However, inexperienced programmers might fail to write proper exception handling code in their programs. In this paper, we introduce ExAssist, a code recommendation tool for exception handling. Ex-Assist can predict what types of exception could occur in a given piece of code and recommend proper exception handling code for such an exception. Preliminary evaluation of ExAssist suggests that it provides highly accurate recommendations.
Software engineering corpora often contain domain-specific concepts and linguistic patterns. Popular text analysis tools are not specially designed to analyze such concepts and patterns. In this paper, we introduce ALPACA, a novel, customizable text analysis framework. The main purpose of ALPACA is to analyze topics and their trends in a text corpus. It allows users to define a topic with a few initial domain-specific keywords and expand it into a much larger set. Every single keyword can be expanded into long clauses to describe topics more precisely. ALPACA extracts those clauses by matching text with linguistic patterns, which are long sequences mixing both specific words and part-of-speech tags frequently appeared in the corpus. ALPACA can detect these patterns directly from pre-processed text We present one example demonstrates the use of ALPACA for text corpora of security reports.
Agile development is in widespread use, even in safety-critical domains. However, there is a lack of an appropriate safety analysis and verification method in agile development. In this poster, we propose the use of Behavior Driven Development for safety verification with System-Theoretic Process Analysis for safety analysis in agile development. It shows a good capability on communication effectiveness through a preliminary controlled experiment.
Code smells can be subjectively interpreted, the results provided by detectors are usually different, the agreement in the results is scarce, and a benchmark for the comparison of these results is not yet available. The main approaches used to detect code smells are based on the computation of a set of metrics. However code smell detectors often use different metrics and/or different thresholds, according to their detection rules. As result of this inconsistency the number of detected smells can increase or decrease accordingly, and this makes hard to understand when, for a specific software, a certain characteristic identifies a code smell or not. In this work, we introduce WekaNose, a tool that allows to perform an experiment to study code smell detection through machine learning techniques. The experiment's purpose is to select rules, and/or obtain trained algorithms, that can classify an instance (method or class) as affected or not by a code smell. These rules have the main advantage of being extracted through an example-based approach, rather then a heuristic-based one.
Creating secure and privacy-protecting systems entails the simultaneous coordination of development activities along three different yet mutually influencing dimensions: translating (security and privacy) goals to design choices, analyzing the design for threats, and performing a risk analysis of these threats in light of the goals.
These activities are often executed in isolation, and such a disconnect impedes the prioritization of elicited threats, assessment which threats are sufficiently mitigated, and decision-making in terms of which risks can be accepted.
In the proposed TMaRA approach, we facilitate the simultaneous consideration of these dimensions by integrating support for threat modeling, risk analysis, and design decisions. Key risk assessment inputs are systematically modeled and threat modeling efforts are fed back into the risk management process. This enables prioritizing threats based on their estimated risk, thereby providing decision support in the mitigation, acceptance, or transferral of risk for the system under design.
Today, model-driven approaches are a cornerstone in modern software development. The Eclipse Modeling Framework (EMF) is highly adopted in practice and generates Java code from platform-independent models with embedded Object Constraint Language (OCL) expressions. However, applications that target multiple platforms like Android, iOS, Windows, web browsers usually need to be implemented in different programming languages. Feature-complete Ecore and OCL runtime APIs are not available for all these platforms, such that their functionality has to be re-implemented. In this paper, we present CrossEcore: A multi-platform enabled modeling framework that generates C#, Swift, TypeScript, and JavaScript code from Ecore models with embedded OCL. An OCL compiler translates OCL expressions into expressions of the target language. The Ecore and OCL API can be consistently used across platforms, which facilitates application portability. CrossEcore is also extendible and can be easily adopted for new programming languages.
Most software CQAs (e.g. Stack Overflow) mainly rely on users to assign tags for posted questions. This leads to many redundant, inconsistent and inaccurate tags that are detrimental to the communities. Therefore tag quality becomes a critical challenge to deal with. In this work, we propose STR, a deep learning based approach that automatically recommends tags through learning the semantics of both tags and questions in such software CQAs. First, word embedding is employed to convert text information to high-dimension vectors for better representing questions and tags. Second, a Multi-tasking-like Convolutional Neural Network, the core modules of STR, is designed to capture short and long semantics. Third, the learned semantic vectors are fed into a gradient descent based algorithm for classification. Finally, we evaluate STR on three datasets collected from popular software CQAs, and experimental results show that STR outperforms state-of-the-art approaches in terms of [email protected], [email protected] and F1 - [email protected]
Smart contracts that run on blockchains can ensure the transactions are automatically, reliably performed as agreed upon between the participants without a trusted third party. In this work, we propose a smart-contract based algorithm for constructing service-based systems through the composition of existing services.
Developers change software models continuously but often fail in keeping them consistent. Inconsistencies caused by such changes need to be repaired eventually. While we found that usually few model elements need to be repaired for any given inconsistency, there are many possible repair values for any given model element. To make matters worse, model elements need to be repaired in combination. The result is a large and exponentially growing repair space. In this paper we present an approach towards grouping alike repair values if they have the same effect to provide example-like feedback for developers. A preliminary evaluation shows that our approach can more scalably explore the repair space.
The number of scientific publications is continuously increasing, with most publications describing research that is also interesting for industrial software engineers. Program comprehension in particular is an essential and time consuming task in industry, but new approaches are rarely adopted. We conducted a survey with 89 participants from research and industry to investigate this problem. Our results indicate that researchers have to integrate other ways to communicate their work and make evaluations more practical.
Observation-Based Slicing (ORBS) is a recently introduced program slicing technique based on direct observation of program semantics. Previous ORBS implementations slice a program by iteratively deleting adjacent lines of code. This paper introduces two new deletion operators based on lexical similarity. Furthermore, it presents a generalization of O RBS that can exploit multiple deletion operators: Multi-operator Observation-Based Slicing (MOBS). An empirical evaluation of MOBS using three real world Java projects finds that the use of lexical information, improves the efficiency of ORBS: MOBS can delete up to 87% of lines while taking only about 33% of the execution time with respect to the original ORBS.
The migration from existing software variants to a software product line is an arduous task that necessitates to synthesise a variability model based on already developed softwares. Nowadays, the increasing complexity of software product lines compels practitioners to design more complex variability models that represent other information than binary features, e.g., multi-valued attributes. Assisting the extraction of complex variability models from variant descriptions is a key task to help the migration towards complex software product lines. In this paper, we address the problem of extracting complex variability information from software descriptions, as a part of the process of complex variability model synthesis. We propose an approach based on Pattern Structures to extract variability information, in the form of logical relationships involving both binary features and multi-valued attributes.
Test Suite Reduction (TSR) approaches speed up regression testing by removing redundant test cases. TSR approaches can be classified as adequate or inadequate. Adequate approaches reduce test suites so that they completely preserve the test requirements (e.g., statement coverage) of the original test suites. Inadequate approaches produce reduced test suites that only partially preserve the test requirements. We propose a tool prototype for inadequate TSR and named it CUTER (ClUstering-based TEst suite Reduction). CUTER implements a clustering-based approach and a number of instances of its underlying process. We implemented CUTER as an Eclipse plug-in and applied it on 19 versions of four Java programs.
We conducted a controlled experiment with 55 final-year undergraduate students in Computer Science. We asked them to comprehend functional requirements exposing them or not to noise. We did not observe any effect of noise on requirements comprehension.
UML creates useful visualizations but they become monolithic, complex, and expensive to maintain. In agile development, documentation is secondary, which discourages the use of UML even further. We introduce an in-code, just-in-time, maintainable approach to UML, supported by a tool called PREXEL. PREXEL minimizes interruptions in coding by allowing concise in-line specifications which automatically synthesize in-code graphical ASCII class models, class and method skeletons, and class relationships.
Manually writing pre- and postconditions to document the behavior of a large library is a time-consuming task; what is needed is a way to automatically infer them. Conventional wisdom is that, if one has preconditions, then one can use the strongest postcondition predicate transformer (SP) to infer postconditions. However, we have performed a study using 2,300 methods in 7 popular Java libraries, and found that SP yields postconditions that are exponentially large, which makes them difficult to use, either by humans or by tools.
We solve this problem using a novel algorithm and tool for inferring method postconditions, using the SP, and transmuting the inferred postconditions to make them more concise.
We applied our technique to infer postconditions for over 2,300 methods in seven popular Java libraries. Our technique was able to infer specifications for 75.7% of these methods. Each of these inferred postconditions was verified using an Extended Static Checker. We also found that 84.6% of resulting specifications were less than 1/4 page (20 lines) in length. Our algorithm was able to reduce the length of SMT proofs needed for verifying implementations by 76.7% and reduced prover execution time by 26.7%.
One of the challenging issues of the existing static analysis tools is the high false alarm rate. To address the false alarm issue, we design bug detection rules by learning from a large number of real bugs from open-source projects from GitHub. Specifically, we build a framework that learns and refines bug detection rules for fewer false positives. Based on the framework, we implemented ten patterns, six of which are new ones to existing tools. To evaluate the framework, we implemented a static analysis tool, F
We present a novel tool, AUREA, that automatically classifies mobile app reviews, filters and facilitates their analysis using fine grained mobile specific topics. We aim to help developers analyse the direct and valuable feedback that users provide through their reviews, in order to better plan maintenance and evolution activities for their apps. Reviews are often difficult to analyse because of their unstructured textual nature and their frequency, moreover only a third of them are informative. We believe that by using our tool, developers can reduce the amount of time required to analyse and understand the issues users encounter and plan appropriate change tasks.
Successfully onboarding open source projects in GitHub is difficult for developers, because it is time-consuming for them to search an expected project by a few query words from numerous repositories, and developers suffer from various social and technical barriers in joined projects. Frequently failed onboarding postpones developers' development schedule, and the evolutionary progress of open source projects. To mitigate developers' costly efforts for onboarding, we propose a ranking model NNLRank (Neural Network for List-wise Ranking) to recommend projects that developers are likely to contribute many commits. Based on 9 measured project features, NNLRank learns a ranking function (represented by a neural network, optimized by a list-wise ranking loss function) to score a list of candidate projects, where top-n scored candidates are recommended to a target developer. We evaluate NNLRank by 2044 succeeded onboarding decisions from GitHub developers, comparing with a related model LP (Link Prediction), and 3 other typical ranking models. Results show that NNLRank can provide developers with effective recommendation, substantially outperforming baselines.
Control flow obfuscation is a direct approach in protecting the confidentiality of program logic. However, existing works in this direction either failed to offer high confidentiality guarantees or incurred high performance overheads. In this paper, we propose CFHider, a high security and high performance control flow obfuscation technique. By leveraging program transformation and Intel Software Guard Extension (SGX) technology, CFHider hides control flow information to an opaque yet trusted execution environment, i.e., the SGX enclave. Our evaluation showed that, CFHider extensively raises the bar for reverse-engineering attacks targeting on the control flow confidentiality, and incurs a moderate performance overhead.
Despite the prevalence and importance of microservices in industry, there exists limited research on microservices, partly due to lacking a benchmark system that reflects the characteristics of industrial microservice systems. To fill this gap, we conduct a review of literature and open source systems to identify the gap between existing benchmark systems and industrial microservice systems. Based on the results of the gap analysis, we then develop and release a medium-size benchmark system of microservice architecture.
Context: Most research into software defect prediction ignores the differing amount of effort entailed in searching for defects between software components. The result is sub-optimal solutions in terms of allocating testing resources. Recently effort-aware (EA) defect prediction has sought to redress this deficiency. However, there is a gap between previous classification research and EA prediction.
Objective: We seek to transfer strong defect classification capability to efficient effort-aware software defect prediction.
Method: We study the relationship between classification performance and the cost-effectiveness curve experimentally (using six open-source software data sets).
Results: We observe extremely skewed distributions of change size which contributes to the lack of relationship between classification performance and the ability to find efficient test orderings for defect detection. Trimming allows all effort-aware approaches bridging high classification capability to efficient effort-aware performance.
Conclusion: Effort distributions dominate effort-aware models. Trimming is a practical method to handle this problem.
Industry-strength embedded systems have to meet rigorous application-specific requirements for operating environments. Such requirements are becoming increasingly challenging due to the growing system complexity. Existing works typically focus on reliability driven design optimizations to improve the system robustness. Our work addresses the problem from another perspective by adaptively adjusting its service capability according to a model reflecting the interaction between the embedded system and the environments.
This paper proposes a service capability model to capture the criticality of various services. A model-based adaption mechanism is designed to automatically identify the maximum allowed service capability under the current physical environment. A case study was performed on Industrial Ethernet switches to validate its effectiveness on adaptation to high and low temperatures. Experimental results demonstrate the potential of our approach to improve system reliability under extreme physical environments.
Mobile apps are adopting web techniques for improved development agility. In this paper, we propose TimelyRep to help mobile developers debug and test their web-enabled Android apps. TimelyRep provides efficient deterministic record-and-replay as a software library, running on unmodified Android. Also, as touchscreen becomes the major interaction method for mobile devices, web-enabled apps can receive many events in short periods. TimelyRep embodies a mechanism to control replay delays and achieve smooth replay. TimelyRep also supports cross-device replay where the event trace captured on one device can be replayed on another. We evaluate TimelyRep with real-world web applications. The results show that TimelyRep is useful for reproducing program bugs and has higher timing precision than previous tools.
Clustering is used to partition genomic data into disjoint subsets to streamline further processing. Since inputs can contain billions of nucleotides, performance is paramount. Consequently, clustering software is typically developed as a tightly coupled monolithic system which hinders software reusability, extensibility and introduction of new algorithms as well as data structures. Having experienced similar issues in our own clustering software, we have developed a flexible and extensible parallel framework called
In this paper we address the topic of satisfaction by analysis of the results of a national survey of software development in Switzerland. We found that satisfaction is reported more by those using Agile development than with plan-driven processes. We explored how satisfaction relates to other elements in the development process, including the use of various practices, and the influences on business, team and software issues. We found that certain practices and influences have high correlations to satisfaction, and that collaborative processes are closely related to satisfaction, especially when combined with technical practices. Our intention in this analysis is principally descriptive, but we think the results are important to understand the challenges for everyone involved in Agile development, and can help in the transformation to Agile.
Information Retrieval (IR) plays a key role in diverse Software Engineering (SE) tasks. Similarity metric is the core component of any IR techniques whose performance differs for various document types. Different SE tasks operate on different types of documents like bug reports, software descriptions, source code, etc., that often contain non-standard domain-specific vocabulary. Thus, it is important to understand which similarity metrics are suitable for different SE documents.
We analyze the performance of different similarity metrics on various SE documents including a diverse combination of textual (e.g., description, readme), code (e.g., source code, API, import package), and a mixture of text and code (e.g., bug reports) artifacts. We observe that, in general, the context-aware IR models achieve better performance on textual artifacts. In contrast, simple keyword-based bag-of-words models perform better in code artifacts.
The role of a well-designed method should not change frequently or significantly over its lifetime. As such, changes to the role of a method can be an indicator of design improvement or degradation. To measure this, we use method stereotypes. Method stereotypes provide a high-level description of a method's behavior and role; giving insight into how a method interacts with its environment and carries out tasks. When a method's stereotype changes, so has its role. This work presents a taxonomy of how method stereotypes change and why the categories of changes are significant.
Automated program repair (APR) seeks to improve the speed and decrease the cost of repairing software bugs. Existing APR approaches use unit tests or constraint solving to find and validate program patches. We propose Canonical Search And Repair (CSAR), a program repair technique based on semantic search which uses a canonical form of the path conditions to characterize buggy and patch code and allows for easy storage and retrieval of software patches, without the need for expensive constraint solving. CSAR uses string metrics over the canonical forms to cheaply measure semantic distance between patches and buggy code and uses a classifier to identify situations in which test suite executions are unnecessary-and to provide a finer-grained means of differentiating between potential patches.
We evaluate CSAR on the IntroClass benchmark, and show that CSAR finds more correct patches (96% increase) than previous semantic search approaches, and more correct patches (34% increase) than other previous state-of-the-art in program repair.
Coupling metrics are an established way to measure internal software quality with respect to modularity. Dynamic metrics have been used to improve the accuracy of static metrics for object-oriented software. We introduce a dynamic metric NOI that takes into account the number of interactions (method calls) during the run of a system. We used the data collected from an experiment to compute our NOI metric and compared the results to a static coupling analysis. We observed an unexpected level of correlation and significant differences between class- and package-level analyses.
Representative sampling is considered crucial for predominately quantitative, positivist research. Researchers typically argue that a sample is representative when items are selected randomly from a population. However, random sampling is rare in empirical software engineering research because there are no credible sampling frames (population lists) for the units of analysis software engineering researchers study (e.g. software projects, code libraries, developers, projects). This means that most software engineering research does not support statistical generalization, but rejecting any particular study for lack of random sampling is capricious.
The time required to generate valid structurally complex inputs grows exponentially with the input size. It makes it hard to predict, for a given structure, the most feasible input size that is completely explorable within a time budget. Iterative deepening generates inputs of size n before those of size n + 1 and eliminates guesswork to find such size. We build on Korat algorithm for structural test generation and present iKorat - an incremental algorithm for efficient iterative deepening. It avoids redundant work as opposed to naive Korat-based iterative deepening by using information from smaller sizes, which is kept in a highly compact format.
Recent findings from a user study suggest that IR-based bug localization techniques do not perform well if the bug report lacks rich structured information such as relevant program entity names. On the contrary excessive structured information such as stack traces in the bug report might always not be helpful for the automated bug localization. In this paper, we conduct a large empirical study using 5,500 bug reports from eight subject systems and replicating three existing studies from the literature. Our findings (1) empirically demonstrate how quality dynamics of bug reports affect the performances of IR-based bug localization, and (2) suggest potential ways (e.g., query reformulations) to overcome such limitations.
Approaches to Android malware detection built on supervised learning are commonly subject to frequent retraining, or the trained classifier may fail to detect newly emerged or emerging kinds of malware. This work targets a sustainable Android malware detector that, once trained on a dataset, can continue to effectively detect new malware without retraining. To that end, we investigate how the behaviors of benign and malicious apps evolve over time, and identify the most consistently discriminating behavioral traits of benign apps from malware. Our preliminary results reveal a promising prospect of this approach. On a benchmark set across seven years, our approach achieved highly competitive detection accuracy that sustained up to five years, outperforming the state of the art which sustained up to two years.
Despite the great number of clone detection approaches proposed in the literature, few have the scalability and speed to analyze large inter-project source datasets, where clone detection has many potential applications. Furthermore, because of the many uses of clone detection, an approach is needed that can adapt to the needs of the user to detect any kind of clone. We propose a clone detection approach designed for user-guided clone detection by exploiting the power of source transformation in a plugin based source processing pipeline. Clones are detected using a simple Jaccard-based clone similarity metric, and users customize the representation of their source code as sets of terms to target particular types or kinds of clones. Fast and scalable clone detection is achieved with indexing, sub-block filtering and input partitioning.
Modern systems often have complex configuration spaces. Research has shown that people often just use default settings. This practice leaves significant performance potential unrealized. In this work, we propose an approach that uses metaheuristic search algorithms to explore the configuration space of Hadoop for high-performing configurations. We present results of a set of experiments to show that our approach can find configurations that perform significantly better than defaults. We tested two metaheuristic search algorithms---coordinate descent and genetic algorithms---for three common MapReduce programs---Wordcount, Sort, and Terasort---for a total of six experiments. Our results suggest that metaheuristic search can find configurations cost-effectively that perform significantly better than baseline default configurations.
Program authorship attribution has implications for the privacy of programmers who wish to contribute code anonymously. While previous work has shown that complete files that are individually authored can be attributed, these efforts have focused on ideal data sets such as the Google Code Jam data. We explore the problem of attribution "in the wild," examining source code obtained from open source version control systems, and investigate if and how such contributions can be attributed to their authors, either individually or on a per-account basis. In this work we show that accounts belonging to open source contributors containing short, incomplete, and typically uncompilable fragments can be effectively attributed.
Socio-Technical Congruence (STC) indicates that social interactions among developers should be congruent with technical dependencies among their tasks. Prior research discovered that the lack of the "should-happen" communication will lead to integration errors and productivity decrease. However, the opposite scenario, excessive communication not matched by any technical dependencies, has been largely neglected. This paper terms such scenario as Transgressive Incongruence (TraIn). To automatically pin-point source files involved in TraIn, this paper defines a new form of coupling between files, called communication coupling. It measures the communication traffic among developers working on two files. Evaluation on 6 Apache open source projects reveals: 1) the communication coupling between files with structural dependencies is 3 to 10 times higher than that between files independent from each other; and 2) source files involved in TraIn are usually very bug-prone. This implies that TraIn may have negative impact on the quality of software systems, and thus should merit due attention.
As a promising technology, blockchain overcomes many shortages in traditional areas. However, the low efficiency prevents it from being widely used in practice. Through analyzing the transaction history in blockchain, we found that the account usage frequency is highly heterogeneous. Based on this observation, we propose a new account structure to improve the efficiency. The preliminary experiments show that the proposed structure has obvious advantages in efficiency compared with other account structures.
Heterogeneous information networks (HINs) are logical networks which involve multiple types of objects and multiple types of links denoting different relations. Previous API recommendation studies mainly focus on homogeneous networks or few kinds of relations rather than exploiting the rich heterogeneous information. In this paper, we propose a mashup group preference based API recommendation method for mashup creation. Based on the historical invocation experience, different semantic meanings behind meta paths, hybrid similarity measurement and the rich interactions among mashups, we build the API recommendation model and employ the model to make personalized API recommendation for different mashup developers. Extensive experimental results validate the effectiveness of our proposed approach in terms of different kinds of evaluation metrics.
Developing modern mobile applications often require the uses of many libraries specific for the mobile platform, which can be overwhelmingly too many for application developers to find what are needed for a functionality and where and how to use them properly. This paper presents a tool, named LibraryGuru, to recommend suitable Android APIs for given functionality descriptions. It not only recommends functional APIs that can be invoked for implementing the functionality, but also recommends event callback APIs that are inherent in the Android framework and need to be overridden in the application. LibraryGuru internally builds correlation databases among various functionality descriptions and Android APIs. These correlations are extracted from Android development tutorials and SDK documents with domain-specific code parsing and natural language processing techniques adapted for functional APIs and event callback APIs separately, and are matched against functionality queries to recommend relevant APIs for developers. LibraryGuru is publicly accessible at http://libraryguru.info, and a demo video is available at https://youtu.be/f7MtjliUM-4.
Continuous Integration (CI) reduces risk in software development, but a CI build usually brings huge time and resource consumption. Machine learning methods have been employed to cut the expenses of CI and provide instant feedback by predicting CI results. Nevertheless, effective learning requires massive training data which is not available for a new project. Moreover, due to the diversified characteristics of different projects, reusing models built on other projects leads to poor performance. To address this problem, we propose a novel active online model adaptation approach ACONA, which dynamically adapts a pool of classifiers trained on various projects to a new project using only a small fraction of new data it actively selects. With empirical study on Travis CI, we show that ACONA achieves an improvement of F-Measure by 40.0% while reducing Accumulated Error by 63.2% and the adapted model outperforms existing approaches.
The objective of this poster paper is to investigate on how to deal with environmental uncertainty in goal based requirements engineering. To do so, we explore the introduction of RELAX concepts into SysMLKaos. RELAX is a Requirements Engineering language for Dynamically Adaptive Systems while SysMLKaos is a Goal based Requirements Engineering approach. We use an extract of a Landing Gear System case study to illustrate the proposed approach.
Generating GUI tests for complex Web applications is hard. There is lots of functionality to explore: The eBay home page, for instance, sports more than 2,000 individual GUI elements that a crawler has to trigger in order to discover the core functionality. We show how to leverage tests of other applications to guide test generation for a new application: Given a test for payments on Amazon, for instance, we can guide test generation on eBay towards payment functionality, exploiting the semantic similarity between UI elements across both applications. Evaluated on three domains, our approach allows to discover "deep" functionality in a few steps, which otherwise would require thousands to millions of crawling interactions.
As spectrum-based fault localization (SFL) reasons about coverage rather than source code, it allows for a lightweight, language agnostic way of pinpointing faults in software. However, SFL misses certain faults, such as errors of omission, and may fail to provide enough contextual information about its diagnoses. We propose Q-SFL, that leverages the concept of qualitative reasoning to augment the information made available to SFL techniques, by qualitatively partitioning the values of data units from the system, and treating each qualitative state as a new SFL component to be used when diagnosing.
New business and technology demands as well as Merger & Acquisitions force organizations to adapt their business models and IT infrastructure in order to stay competitive. Ensuring the compliance of new projects with the current goals and principles is an essential task. The discipline of Enterprise Architecture Planning provides methods for the structured development of the business and IT of an organization. The established models of the current and target architecture are used to provide the respective information for decision making at a sufficient aggregation level. We propose a model-based and tool-supported method for EA planning and in specific for the evaluation of project compliance. We utilize gap and impact analysis to ensure change consistency as well as view generation and metric calculation for evaluation purposes. The analyses are executed within a generic analysis architecture execution environment (A2E), that supports the customization of the analyses as well as different EA meta models. The method and the proposed analyses are evaluated within a case study from a medium-sized software product company.
Business Process (BP) development is a challenging task for small and medium organizations who do not have sufficient resources for design, coding, and management of their BPs. Cloud infrastructure and service-oriented middleware can be leveraged for rapid development and deployment of BPs of such organizations. BP development in the cloud-based environment can be done collaboratively by exploiting the knowledge of existing BPs of related organizations. In this paper, we present ASSEMBLE, a tool for collaborative BP development in the cloud. ASSEMBLE implements our service mapping approach that utilizes the attribute, structural and semantics information of service operations of existing BPs in a given domain to help a user organization to compose its BP. Given a collection of related BPs and available service operations of a user organization, ASSEMBLE computes a mapping between service operations of the user organization and BP operations of other organizations. The tool also generates the executable BP code in standard BPEL language for deployment on a process execution engine on the user organization's site or on the cloud.
Combinatorial testing (CT) is very efficient to test parameterized systems. Kuhn et al. investigated the interaction faults of some real programs, and found that the faulty combinations are caused by the combination of no more than 6 parameters. Three or fewer parameters triggered a total of almost 90% of the failures in the application[3]. However, for high-quality software, simply testing all 3-way combinations is not sufficient [5], which may increase the risk of residual errors that lead to system failures and security weakness[4]. In addition, the number of test cases at 100% coverage for high-way is huge, which is beyond the farthest test overhead restrictions. Covering array is typically used as the test suite in CT, which should convey much information for the fault detection. We firstly proposed a weighted combinatorial coverage (CC), focusing on the fault detection capability of each test case instead of 100% percent t-way CC. Secondly, we give the test case selection algorithm FWA (fixed weight algorithm) using weighted CC metric. For generating each test case, our method first randomly generates several candidates, and selects the one that has the highest fault-detection possibility with the different sampling pool size. Thirdly, we give the theorems for our algorithm and definitions for the weighted CC. Finally, we compared the selected sample sized and the fault-detection capabilities of FWA as well as t-wise algorithms by using the four benchmarks with configuration options interaction faults, and we found FWA is able to detect higher number of faults with the less selected sample size, specifically, FWA is able to detect high-wise interaction faults with the less selected sample size compared with the 4-wise as well as 5-wise algorithms.
The use of mobile apps is increasingly widespread, and much effort is put into testing these apps to make sure they behave as intended. To reduce this effort, and thus the cost of mobile app testing, we propose A
How to generate an appropriate set of test cases which can effectively show the difference between an old and new version of a program becomes a challenging research topic. In this paper, we consider both control divergence and data divergence to explore the difference between two versions of code. To do so, we present a novel model called U
For tasks like code synthesis from natural language, code retrieval, and code summarization, data-driven models have shown great promise. However, creating these models requires parallel data between natural language (NL) and code with fine-grained alignments. S
Test resources are usually limited and therefore it is often not possible to completely test an application before a release. Therefore, testers need to focus their activities on the relevant code regions. In this paper, we introduce an inverse defect prediction approach to identify methods that contain hardly any faults. We applied our approach to six Java open-source projects and show that on average 31.6% of the methods of a project have a low fault risk; they contain in total, on average, only 5.8% of all faults. Furthermore, the results suggest that, unlike defect prediction, our approach can also be applied in cross-project prediction scenarios. Therefore, inverse defect prediction can help prioritize untested code areas and guide testers to increase the fault detection probability.
One of the greatest benefits of microservices is to sensitively ease changing applications by splitting these into independently deployable units [5]. Combined with Continuous Delivery (CD) -that aims at delivering quickly and safely every software releases- and Platform as a Service (PaaS) automating application management in a on-demand virtualized environment, the microservice paradigm has become essential to implement agile processes.
We propose a static analysis technique for iOS executables for checking API call vulnerabilities that can cause 1) app behaviors to be altered by malicious external inputs, and 2) sensitive user data to be illegally accessed by apps with stealthy private API calls that use string obfuscation. We identify sensitive functions that dynamically load classes/frameworks, and, for each parameter that corresponds to a dynamically loaded class/framework, we construct a dependency graph that shows the set of values that flow to that parameter. A sensitive function that has its class name or framework path parameter depending on external inputs is considered to contain a vulnerability. We further conduct string analysis on these dependency graphs to determine all potential string values that these parameters can take, which identifies the set of dynamically loaded classes/frameworks. Taking the intersection of these values with patterns that characterize Apple's API policies (such as restricted use of private/sensitive APIs), we are able to detect potential policy violations and vulnerabilities.
B
O
M
R
C
Assessing the quality of an API is important in many different aspects: First, it can assist developers in deciding which API to use when they are faced with a list of potential APIs to choose from, by comparing the benefits and drawbacks of each option [1]; we refer to this as the API selection problem. Second, it can help guide the design process and expose problem areas in early stages of API design, even before implementing the actual API [2]; we refer to this as the API design problem. In order to assess the quality of an API, various evaluation methods have been used: some are based on empirical laboratory studies, gathering feedback from API users; others are based on inspection methods where experts evaluate the quality of an API based on a list of design guidelines [3] [4] such as Nielsen's heuristics and the cognitive dimensions framework [2] [5]. In this paper, we are particularly interested in extending Steven Clarke's approach of measuring API usability based on the cognitive dimensions framework [5]. The usability of an API is assessed by comparing the API (what it actually offers) with the profiles of its potential users (what they expect out of it).
This paper evaluates eight popular mobile UI automation frameworks. We have discovered that there are automation frameworks that increase energy consumption up to 7500%. While limited in the interactions one can do, Espresso is the most energy efficient framework. Depending on the needs of the tester, Appium, Monkeyrunner, or UIAutomator are good alternatives. We show the importance of using energy efficient frameworks and provide a decision tree to help developers make an educated decision on which framework suits best their testing needs.
Designing an effective and useful dashboard is expensive and it would be important to determine if it is possible to elaborate a "generic" useful and effective dashboard, usable in a variety of circumstances. To determine if it is possible to develop such dashboard and, if so, its structure we interviewed 67 software engineers from 44 different companies. Their answers made us confident in the possibility of building such dashboard.
Existing work in modeling developer expertise assumes that developers reflect their expertise in their contributions and that such expertise can be analyzed to provide support for developer tasks. However, developers also make contributions in which they reflect their inexpertise such as by making mistakes in their code. We refine the hypotheses of the expertise-identification literature by proposing developer inexpertise as a factor that should be modeled to automate support for developer tasks.
Code refactoring is widely practiced by software developers. There is an explicit assumption that code refactoring improves the structural quality of a software project, thereby also reducing its bug proneness. However, refactoring is often applied with different purposes in practice. Depending on the complexity of certain refactorings, developers might unconsciously make the source code more susceptible to have bugs. In this paper, we present a longitudinal study of 5 Java open source projects, where 20,689 refactorings, and 1,033 bug reports were analyzed. We found that many bugs are introduced in the refactored code as soon as the first immediate change is made on it. Furthermore, code elements affected by refactorings performed in conjunction with other changes are more prone to have bugs than those affected by pure refactorings.
Mutation testing is one of the strongest code-based test criteria. However, it is expensive as it involves a large number of mutants. To deal with this issue we propose a machine learning approach that learns to select fault revealing mutants. Fault revealing mutants are valuable to testers as their killing results in (collateral) fault revelation. We thus, formulate mutant reduction as the problem of selecting the mutants that are most likely to lead to test cases that uncover unknown program faults. We tackle this problem using a set of static program features and machine learning. Experimental results involving 1,629 real faults show that our approach reveals 14% to 18% more faults than a random mutant selection baseline.
Despite many software engineering efforts and programming language support, resource and memory leaks remain a troublesome issue in managed languages such as Java. Understanding the properties of leak-related issues, such as their type distribution, how they are found, and which defects induce them is an essential prerequisite for designing better approaches for avoidance, diagnosis, and repair of leak-related bugs. To answer these questions, we conduct an empirical study on 452 issues found in repositories of 10 mature Apache Java projects.
Source code analysis at a large scale is useful for solving many software engineering problems, however, could be very expensive, thus, making its use difficult. This work proposes hybrid traversal, a technique for performing source code analysis over control flow graphs more efficiently. Analysis over a control flow graph requires traversing the graph and it can be done using several traversal strategies. Our observation is that no single traversal strategy is suitable for different analyses and different graphs.
Our key insight is that using the characteristics of the analysis and the properties of the graph it is possible to select a most efficient traversal strategy for a <analysis, graph> pair. Our evaluation using a set of analyses with different characteristics and a large dataset of graphs with different properties shows up to 30% reduction in the analysis time. Further, the overhead of our technique for selecting the most efficient traversal strategy is very low; between 0.01%--0.2%.
Use of infrastructure as code (IaC) scripts helps software teams manage their configuration and infrastructure automatically. Information technology (IT) organizations use IaC scripts to create and manage automated deployment pipelines to deliver services rapidly. IaC scripts can be defective, resulting in dire consequences, such as creating wide-scale service outages for end-users. Prediction of defective IaC scripts can help teams to mitigate defects in these scripts by prioritizing their inspection efforts. The goal of this paper is to help software practitioners in prioritizing their inspection efforts for infrastructure as code (IaC) scripts by proposing defect prediction model-related metrics. IaC scripts use domain specific languages (DSL) that are fundamentally different from object-oriented programming (OOP) languages. Hence, the OOP-based metrics that researchers used in defect prediction might not be applicable for IaC scripts. We apply Constructivist Grounded Theory (CGT) on defect-related commits mined from version control systems to identify metrics suitable for IaC scripts. By applying CGT, we identify 18 metrics. Of these metrics, 13 are related to IaC, for example, count of string occurrences in a script. Four of the identified metrics are related to churn, and one metric is lines of code.
The growing adoption of small unmanned aircraft systems (sUAS) for tasks such as eCommerce, aerial surveillance, and environmental monitoring introduces the need for new safety mechanisms in an increasingly cluttered airspace. Safety assurance cases (SAC) provide a state-of-the-art solution for reasoning about system and software safety in numerous safety-critical domains. We propose a novel approach based on the idea of interlocking safety cases. The sUAS infrastructure safety case (iSAC) specifies assumptions and applies constraints upon the behavior of sUAS entering the airspace. Each sUAS then demonstrates compliance to the iSAC by presenting its own (partial) safety case (uSAC) which connects to the iSAC through a set of interlock points. To enforce a "trust but verify" policy, sUAS conformance is monitored at runtime while it is in the airspace and its behavior is described using a reputation model based on the iSAC's expectations of its behavior.
User Acceptance Testing (UAT) aims to determine whether or not a software satisfies users acceptance criteria. Although some studies have used acceptance tests as software requirements, no previous study has collected information about available UAT techniques and established a comparison of them, to support an organization in the selection of one over another. This work presents a Systematic Literature Review on UAT to find out available techniques and compare their main features. We selected 80 studies and found out 21 UAT techniques. As result, we created a comparative table summarizing these techniques and their features.
Assurance cases, which provide an organized and explicit argument for correctness, should be used for certifying Scientific Computing Software (SCS), especially when the software impacts health and safety. Assurance cases have already been effectively used for safety cases for real time systems. Their advantages for SCS include engaging domain experts, producing only necessary documentation, and providing evidence that can potentially be verified/replicated by a third party. This paper illustrates assurance cases for SCS through the correctness case for 3dfim+, an existing medical imaging application. No errors were found in 3dfim+. However, the example still justifies the value of assurance cases, since the existing documentation is shown to have ambiguities and omissions, such as an incompletely defined ranking function and missing details on the coordinate system convention adopted. In addition, a potential concern for the software itself is identified: running the software does not produce any warning about the necessity of using data that matches the assumed parametric statistical model.
Code clones are common in software. When applying similar edits to clones, developers often find it difficult to examine the runtime behavior of clones. The problem is exacerbated when some clones are tested, while their counterparts are not. To reuse tests for similar but not identical clones, G
We present a static, scalable analysis technique for detecting side channels in software systems. Our method is motivated by the observation that a sizable class of side-channel vulnerabilities occur when the value of private data results in multiple distinct control flow paths with differentiable observables. Given a set of secret variables, a type of side channel, and a program, our goal is to detect the set of branch conditions responsible for potential side channels of the given type in the program, and generate a pair of witness paths in the control flow graph for the detected side channel. Our technique achieves this by analyzing the control flow graph of the program with respect to a cost model (such as time or memory usage), and identifies if a change in the secret value can cause a detectable change in the observed cost of the program behavior. It also generates a pair of witness paths in the control flow graph, differing only on the branch conditions influenced by the secret, and whose observable output under the given side channel differs by more than some user defined threshold. We implemented our approach in a prototype tool, C
Identifying security issues before attackers do has become a critical concern for software development teams and software users. While methods for finding programming errors (e.g. fuzzers 1, static code analysis [3] and vulnerability prediction models like Scandariato et al. [10]) are valuable, identifying security issues related to the lack of secure design principles and to poor development processes could help ensure that programming errors are avoided before they are committed to source code.
Typical approaches (e.g. [4, 6--8]) to identifying security-related messages in software development project repositories use text mining based on pre-selected sets of standard security-related keywords, for instance; authentication, ssl, encryption, availability, or password. We hypothesize that these standard keywords may not capture the entire spectrum of security-related issues in a project, and that additional project-specific and/or domain-specific vocabulary may be needed to develop an accurate picture of a project's security.
For instance, Arnold et al.[1], in a review of bug-fix patches on Linux kernel version 2.6.24, identified a commit (commit message: "Fix - >vm_file accounting, mmap_region() may do do_munmap()" 2) with serious security consequences that was mis-classified as a non-security bug. While no typical security keyword is mentioned, memory mapping ('mmap') in the domain of kernel development has significance from a security perspective, parallel to buffer overflows in languages like C/C++. Whether memory or currency is at stake, identifying changes to assets that the software manages is potentially security-related.
The goal of this research is to support researchers and practitioners in identifying security issues in software development project artifacts by defining and evaluating a systematic scheme for identifying project-specific security vocabularies that can be used for keyword-based classification.
We derive three research questions from our goal:
• RQ1: How does the vocabulary of security issues vary between software development projects?
• RQ2: How well do project-specific security vocabularies identify messages related to publicly reported vulnerabilities?
• RQ3: How well do existing security keywords identify project-specific security-related messages and messages related to publicly reported vulnerabilities?
To address these research questions, we collected developer email, bug tracking, commit message, and CVE record project artifacts from three open source projects : Dolibarr, Apache Camel, and Apache Derby. We manually classified 5400 messages from the three project's commit messages, bug trackers, and emails, and linked the messages to each project's public vulnerability records, Adapting techniques from Bachmann and Bernstein [2], Schermann et al. [11], and Guzzi [5], we analyzed each project's security vocabulary and the vocabulary's relationship to the project's vulnerabilities. We trained two classifiers (Model.A and Model.B) on samples of the project data, and used the classifiers to predict security-related messages in the manually-classified project oracles.
Our contributions include:
• A systematic scheme for linking CVE records to related messages in software development project artifacts
• An empirical evaluation of project-specific security vocabulary similarities and differences between project artifacts and between projects
To summarize our findings on RQ1, we present tables of our qualitative and quantitative results. We tabulated counts of words found in security-related messages. Traditional security keywords (e.g. password, encryption) are present, particularly in the explicit column, but each project also contains terms describing entities unique to the project, for example 'endpoint' (Camel), 'blob' (short for 'Binary Large Object'), 'clob' ('Character Large Object'), 'deadlock' (Derby), and 'invoice', 'order' for Dolibarr. The presence of these terms in security-related issues suggests that they are assets worthy of careful attention during the development life cycle.
Table 1 lists the statistics for security-related messages from the three projects, broken down by security class and security property. Explicit security-related messages (messages referencing security properties) are in the minority in each project. Implicit messages represent the majority of security-related messages in each project.
In Table 2, we present the results of the classifiers built using the various project and literature security vocabularies to predict security-related messages in the oracle and CVE datasets. We have marked in bold the highest result for each performance measure for each dataset. Both Models A and B have a high performance across the projects when predicting for the oracle dataset of the project for which they were built. Further, the project-specific models have higher performance than the literature-based models (Ray.vocab [9] and Pletea.vocab [7]) on the project oracle datasets. Model performance is not sustained and is inconsistent when applied to other project's datasets.
To summarize our findings on RQ2, Table 3 presents performance results for the project vocabulary models on the CVE datasets for each project. We have marked in bold the highest result for each performance measure for each dataset. Results for Model.A shows a high recall for Derby and Camel and a worse than average recall for Dolibarr. However, in Model.B, the recall is above 60% for Dolibarr and over 85% for both Derby and Camel. We reason the low precision is due to our approach of labeling only CVE-related messages as security-related and the rest of the messages are labeled to be not security-related. The Dolibarr results are further complicated by the low proportion of security-related messages compared with the other two projects (as reported in 1).
To summarize our findings on RQ3, Table 2 and Table 3 present the classifier performance results for two sets of keywords, Ray.vocab, and Pletea.vocab, drawn from the literature. In each case, the project vocabulary model had the highest recall, precision and F-Score on the project's oracle dataset. With regards to the CVE-dataset, the project vocabulary model has the highest recall. However, the overall performance, as measured by F-Score, varied by dataset, with the Ray and Pletea keywords scoring higher than the project vocabulary model. The low precision for the classifier built on the project's vocabularies follows the explanation provided under RQ2.
Our results suggest that domain vocabulary model show recalls that outperform standard security terms across our datasets. Our conjecture, supported in our data, is that augmenting standard security keywords with a project's security vocabulary yields a more accurate security picture. In future work, we aim to refine vocabulary selection to improve classifier performance, and to define tools implementing the approach in this paper to aid practitioners and researchers in identifying software project security issues.
Static analysis is a great approach to find bugs and code smells. Some of the errors span across multiple translation units. Unfortunately, it is challenging to achieve cross translation unit analysis for C family languages.
In this short paper, we describe a model and an implementation for cross translation unit (CTU) symbolic execution for C. We were able to extend the scope of the analysis without modifying any of the existing checks. The analysis is implemented in the open source Clang compiler. We also measured the performance of the approach and the quality of the reports. The implementation is already accepted into mainline Clang.
Identifying relationships among program elements, such as functions, is useful for program understanding, debugging, and analysis. We present func2vec, an algorithm that uses static traces to embed functions in a vector space such that related functions are close together, even if they are semantically and syntactically dissimilar. We present applications of func2vec that aid program comprehension.
The functional correctness of a software application is, of course, a prime concern, but other issues such as its execution time, precision, or energy consumption might also be important in some contexts. Systematically testing these quantitative properties is still extremely difficult, in particular, because there exists no method to tell the developer whether such a test set is "good enough" or even whether a test set is better than another one. This paper proposes a new method, called Multimorphic testing, to assess the relative effectiveness of a test suite for revealing performance variations of a software system. By analogy with mutation testing, our core idea is to vary software parameters, and to check whether it makes any difference on the outcome of the tests: i.e. are some tests able to "kill" bad morphs (configurations)? Our method can be used to evaluate the quality of a test suite with respect to a quantitative property of interest, such as execution time or computation accuracy.
Engineering dependable software for mobile robots is becoming increasingly important. A core asset in engineering mobile robots is the mission specification---a formal description of the goals that mobile robots shall achieve. Such mission specifications are used, among others, to synthesize, verify, simulate, or guide the engineering of robot software. Development of precise mission specifications is challenging. Engineers need to translate the mission requirements into specification structures expressed in a logical language---a laborious and error-prone task.
To mitigate this problem, we present a catalog of mission specification patterns for mobile robots. Our focus is on robot movement, one of the most prominent and recurrent specification problems for mobile robots. Our catalog maps common mission specification problems to recurrent solutions, which we provide as templates that can be used by engineers. The patterns are the result of analyzing missions extracted from the literature. For each pattern, we describe usage intent, known uses, relationships to other patterns, and---most importantly---a template representing the solution as a logical formula in temporal logic.
Our specification patterns constitute reusable building blocks that can be used by engineers to create complex mission specifications while reducing specification mistakes. We believe that our patterns support researchers working on tool support and techniques to synthesize and verify mission specifications, and language designers creating rich domain-specific languages for mobile robots, incorporating our patterns as language concepts.
Autotuning is a technique for optimizing the performance of sequential and parallel applications. We explore the problem of successfully applying on-line autotuning to real-world applications. We tune PostgreSQL, an open-source database server software, by optimizing tuning parameters that affect table scans. We evaluate the effects on the performance using the TPC-H benchmark and achieve speedups up to 3.9. A video subsuming the process is available at https://dx.doi.org/10.5445/DIVA/2018-192.
Novel robotic applications are no longer based on single robots. They rather require teams of robots that collaborate and interact to perform a desired mission. They must also be used in contexts in which only partial knowledge about the robots and their environment is present. To ensure mission achievement, robotic applications require the usage of planners that compute the set of actions the robots must perform. Current planning techniques are often based on centralized solutions and hence they do not scale when teams of robots are considered, they consider rather simple missions, and they do not work in partially known environments. To address these challenges, we present a planning solution that decomposes the team of robots into subclasses, considers complex high-level missions given in temporal logic, and at the same time works when only partial knowledge of the environment is available.
Because of the increasing acceptance and possibly expanding market of free/libre open source software (FLOSS), the spectrum and scale of companies that participate in FLOSS development have substantially expanded in recent years. Companies get involved in FLOSS projects to acquire user innovations [3, 12], to reduce costs [8, 11], to make money on complementary services [13], etc. Such intense involvement may change the nature of FLOSS development and pose critical challenges for the sustainability of the projects. For example, it has been found that a company's full control and intense involvement is associated with a decrease of volunteer inflow [13]. Sometimes a project may fail after one company pulls resources from the project [13]. This raises concerns about the domination of one company in a project. In large projects like OpenStack, there are often hundreds of companies involved in contributing code. Despite substantial researches on commercial participation, whether or not one company dominates a project and the impact of such domination has never been explicitly explored. We investigate four main projects of OpenStack, a large ecosystem that has had a tremendous impact on computing and society, to answer the following research questions: Does one company dominate the project's development (RQ1)? If the answer to RQ1 is yes, does the domination affect the community (RQ2)?
As software takes on more responsibility, it gets increasingly complex, requiring an extremely large number of tests for effective validation [1, 6]. Executing these large test suites is expensive, both in terms of time and energy. Cache misses are a significant contributing factor to execution time of software. In this paper, we propose an approach that helps order test executions in a test suite in such a way that instruction cache misses are reduced, and thereby execution time.
To understand requirements traceability in practice, we present a preliminary study of identifying questions from requirements repositories and examining their answering status. Investigating four open-source projects results in 733 requirements questions, among which 43% were answered successfully, 35% were answered unsuccessfully, and 22% were not answered at all. We evaluate the accuracy of using a state-of-the-art natural language processing tool to identify the requirements questions and illuminate automated ways to classify their answering status.
Proposing a new method for automatically detecting, localising, or repairing software faults requires a fair, reproducible evaluation of the effectiveness of the method relative to existing alternatives. Measuring effectiveness requires both an indicative set of bugs, and a mechanism for reliably reproducing and interacting with those bugs. We present B
The Internet of Things (IoT) is a fast propagating technology that is expected to emerge in almost all aspects of our daily life. The IoT environment is well known for being dynamic and uncertain. Connected devices, and their software, can be discovered at runtime and might also become suddenly unavailable. The involvement of the human in the loop complicates more the scene. People's activities are stochastic and their needs are not always predictable. Therefore, coping with the dynamic IoT environment should be considered as a first class requirement when engineering IoT systems. A useful concept for supporting this effort is Emergent Configurations (ECs). An EC consists of a dynamic set of devices that cooperate temporarily to achieve a user goal. This PhD work aims to use the concept of ECs as a basis for a novel approach for realizing IoT systems. More specifically, this thesis aims at: (i) producing characterization models for IoT systems and ECs; (ii) proposing a concrete architecture and an approach for realizing ECs.
The present research project aims to propose methods and tools for mobile applications development and maintenance that rely on effort information (estimations). Specifically, we will focus on two main challenges to overcome existing work: (i) conceiving effort estimation approaches that can be applied earlier in the development cycle and evolve through the development process (ii) prioritizing development and maintenance tasks by relying on effort estimation information.
Recent years have seen an increasing interest in understanding and analyzing cyber-physical system (CPS) models and their development tools. Existing work in this area is limited by the lack of an open corpus of CPS models, which we aim to address by building the by-far largest curated corpus of CPS artifacts. Next, to address the safety-critical aspect of CPS development tools, we discuss the design and evaluation of the very first differential testing framework for arbitrary CPS tool chain. We identify challenges unique to commercial CPS tool chain testing and present a tool implementation which has already found 9 new, confirmed bugs in Simulink, the most widely used CPS development tool.
The number and complexity of robotic applications that are being developed both in industry and academia are increasing continuously. However, those applications are not engineered through well-defined system engineering processes, and this leads to time-consuming issues. Besides, robot applications are increasingly based on teams of autonomous robots that work collaboratively with other robots and/or humans to accomplish complex missions. This further increases the complexity of the controlling application. In this Ph.D. project, we aim to bring software engineering best practices to the robotic domain in order to produce processes, architectural models, and methods to be used by developers in order to tackle common challenges such as reusability, variability, and modularity. The goal is to reduce development time and effort, thereby reducing the time-to-market of robotic applications. To validate our results we make use of different models of real robots in real-world scenarios.
The idea of automated migration support arises from the problems observed in practice and the missing solutions for software product line (SPL) co-evolution support. In practice it is common to realize new functionality via unsystematic code cloning: A product is separated from its related SPL and then modified. When a separated product and the SPL evolve over time, this is called SPL co-evolution. During this process, developers have to manually migrate, for example, features or bugfixes between the SPL and the product. Currently, there exist only partially automated solutions for this use case. The proposed approach is the first, which aims at using semantic merging to migrate arbitrary semantic units, like features or bugfixes, between a SPL and separated products.
The supply chain is an extremely successful way to cope with the risk posed by distributed decision making in product sourcing and distribution. While open source software has similarly distributed decision making and involves code and information flows similar to those in ordinary supply chains, the actual networks necessary to quantify and communicate risks in software supply chains have not been constructed on large scale. This work proposes to close this gap by measuring dependency, code reuse, and knowledge flow networks in open source software. We have done preliminary work by developing suitable tools and methods that rely on public version control data to measure and comparing these networks for R language and emberjs packages. We propose ways to calculate the three networks for the entirety of public software, evaluate their accuracy, and to provide public infrastructure to build risk assessment and mitigation tools for various individual and organizational participants in open sources software. We hope that this infrastructure will contribute to more predictable experience with OSS and lead to its even wider adoption.
Within the context of software engineering, many decisions take place and such decisions should employ value propositions that focus on short as well as long-term goals. In 2003, Boehm coined the term Value-Based Software Engineering (VBSE), which entails the change from a value-neutral to a value-based approach. VBSE argues that decisions should be based on all key stakeholders' value propositions, and to balance both reach the short as well as long-term goal(s) the ones. This paper details a PhD research plan that aims to study the relationship between personality and decision-making within the context of VBSE. The research focuses on group decision-making, considering three aspects: the interaction among decision-makers, their perception of the decision value and their personality traits. The research methodology will be experiment which will revolve around a hypothetical software development project and some decisions that need to be made, for example, the features to be included and the priority of each one. The contribution from a theoretical perspective is to understand the relationship among three main aspects: personality traits, decision-making process, and decision value. From practitioners' perspective, the contribution is to provide help on improving software project decision-making.
The era of Cyber-Physical Systems (CPS) and IoT1 gives rise to the necessity of multi-proficiency in software, hardware, and the Cyber-Physical Space (CPSp) wherein the IoT components are deployed. Focusing on software engineering aspects, this research proposes a model-driven engineering approach to engineer CPS and model pedestrians flow. To this end, design-time decisions and run-time data ought to be fused to improve the efficiency of crowd monitoring and emergency handling. Moreover, the research aims at building some mathematical models applicable as the core of the system controller to facilitate optimum route selection, crowd movement prediction, and hazard diffusion detection; while considers the architectural characteristics of the complex area to be evacuated.
As more aspects of our daily lives rely on technology, the software that enables the technology must be secure. Developers rely on practices such as threat modeling, static and dynamic analyses, code review, and fuzz and penetration testing to engineer secure software. These practices, while effective at identifying vulnerabilities in software, are limited in their ability to describe the potential reasons for the existence of vulnerabilities. In order to overcome this limitation, researchers have proposed empirically validated metrics to identify factors that may have led to the introduction of vulnerabilities in the past. Developers must be made aware of these factors so that they can proactively consider the security implications of each line of code that they contribute. The goal of our research is to assist developers in engineering secure software by providing a technique that generates scientific, interpretable, and actionable feedback on security as the software evolves. In this paper, we provide an overview of our proposed approach to accomplish this research goal through a series of three research studies in which we (1) systematize the knowledge on vulnerability discovery metrics, (2) leverage the metrics to generate feedback on security, and (3) implement a framework for providing automatically generated feedback on security using code reviews as a medium.
Automatic verification of software could save companies from huge losses due to errors in their programs. Existing techniques to prevent and detect errors are mainly based on imprecise heuristics which can report false positives. Formal verification techniques are an alternative to the heuristic approaches. They are more precise and can report errors with higher rigor. However, they cannot be directly applied because current programming languages have no defined semantics that specifies how the source code is interpreted in the execution of programs. Moreover, no existing work tries to develop a semantics for the time domain. The target of this thesis is to provide a verification framework based on a defined time semantics that can help developers to automatically detect time related errors in the behavior of programs. We define a time semantics that allows us to ascribe a meaning to source code statements that alter and use time. Based on the time semantics, we develop an approach to (i) automatically assert time properties and (ii) reverse engineer timed automata, a formal model of the time behavior that is amenable for verification. We plan to evaluate our approaches with quantitative and qualitative studies. The quantitative studies will measure the performance of our approaches in open source projects and the qualitative studies will collect the developers' feedback about the applicability and usefulness of our proposed techniques.
Compilers are one of the most important software infrastructures. Compiler testing is an effective and widely-used way to assure the quality of compilers. While many compiler testing techniques have been proposed to detect compiler bugs, these techniques still suffer from the serious efficiency problem. This is because these techniques need to run a large number of randomly generated test programs on the fly through automated test-generation tools (e.g., Csmith). To accelerate compiler testing, it is desirable to schedule the execution order of the generated test programs so that the test programs that are more likely to trigger compiler bugs are executed earlier. Since different test programs tend to trigger the same compiler bug, the ideal goal of accelerating compiler testing is to execute the test programs triggering different compiler bugs in the beginning. However, such perfect goal is hard to achieve, and thus in this work, we design four steps to approach the ideal goal through learning, in order to largely accelerate compiler testing.
Defects in infrastructure as code (IaC) scripts can have serious consequences for organizations who adopt DevOps. By identifying which characteristics of IaC scripts correlate with defects, we can identify anti-patterns, and help software practitioners make informed decisions on better development and maintenance of IaC scripts, and increase quality of IaC scripts. The goal of this paper is to help practitioners increase the quality of IaC scripts by identifying characteristics of IaC scripts and IaC development process that correlate with defects, and violate security and privacy objectives. We focus on characteristics of IaC scripts and IaC development that (i) correlate with IaC defects, and (ii) violate security and privacy-related objectives namely, confidentiality, availability, and integrity. For our initial studies, we mined open source version control systems from three organizations: Mozilla, Openstack, and Wikimedia, to identify the defect-related characteristics and conduct our case studies. From our empirical analysis, we identify (i) 14 IaC code and four churn characteristics that correlate with defects; and (ii) 12 process characteristics such as, frequency of changes, and ownership of IaC scripts that correlate with defects. We propose the following studies: (i) identify structural characteristics that correlate with defects; (ii) with respect to prediction performance, compare which characteristics of IaC scripts are more correlated with defects; and (iii) identify characteristics that violate security and privacy objectives.
Software development organizations strive to enhance the productivity of their developers. While research has looked into various ways for improving developer productivity, little is known about the activities they pursue at work, how these activities influence the fragmentation of work, and how these insights could be leveraged to foster productivity at work. In my PhD thesis, I address software developer productivity by taking a mixed-method approach to investigate developers' perceptions of productivity in the field and to examine the individual differences of each developer's work. My goal is to increase developers' awareness about their own work habits and productivity, and to encourage productive behavior changes at work through the provision of two persuasive technologies, self-monitoring and goal-setting.
Model mining from software systems can be very helpful for program comprehension. The few existing approaches for extracting high level models from code - when applied to real-world systems written in C - deliver too detailed and complex models that cannot be understood by humans. In my Ph.D. project, I propose an approach that complements fully-automatic model mining approaches with user interaction to get understandable models. The evaluation of this approach includes a controlled experiment with a large number of experts, in order to assess the effectiveness of the interactively mined models for understanding complex legacy software.
Software defect prediction models guide developers and testers to identify defect prone software modules in fewer time and effort, compared to manual inspections of the source code. The state-of-the-art predictors on publicly available software engineering data could catch around 70% of the defects. While early studies mostly utilize static code properties of the software, recent studies incorporate the people factor into the prediction models, such as the number of developers that touched a code unit, the experience of the developer, and interaction and cognitive behaviors of developers. Those information could give a stronger clue about the defect-prone parts because they could explain defect injection patterns in software development. Personalization has been emerging in many other systems such as social platforms, web search engines such that people get customized recommendations based on their actions, profiles and interest. Following this point of view, customization in defect prediction with respect to each developer would increase predictions' accuracy and usefulness than traditional, general models. In this thesis, we focus on building a personalized defect prediction framework that gives instant feedback to the developer at change level, based on historical defect and change data. Our preliminary analysis of the personalized prediction models of 121 developers in six open source projects indicate that, a personalized approach is not always the best model when compared to general models built for six projects. Other factors such as project characteristics, developer's historical data, the context and frequency of contributions, and/or development methodologies might affect which model to consider in practice. Eventually, this topic is open to improvement with further empirical studies on each of these factors.
As modern software systems are becoming increasingly complex, developers often need to rely on online sources to address problems encountered during software development and maintenance. These resources provide developers with access to peers' expertise, covering knowledge of different software lifecycle phases, including design, implementation, and maintenance. However, exploiting such knowledge and converting it into actionable items is far from trivial, due to the vastness of the information available online as well as to its unstructured nature. In this research, we aim at (partially) crowdsourcing the software design, implementation and maintenance process by exploiting the knowledge embedded in various sources available on the Web (e.g., Stack Overflow discussions, presentations on SlideShare, open source code, etc.). For example, we want to support software design decisions (e.g., whether to use a specific library for the implementation of a feature) by performing opinion mining on the vast amount of information available on the Web, and we want to recommend refactoring operations by learning from the code written in open source systems. The final goal is to improve developers' productivity and code quality.
Program comprehension is the cognitive process of understanding code. Researchers have proposed several models to describe program comprehension. However, because program comprehension is an internal process and difficult to measure, the accuracy of the existing models are limited. Neuro-imaging methods, such as functional magnetic resonance imaging (fMRI), provide a novel neuro-cognitive perspective to program-comprehension research. With my thesis work, we aim at establishing fMRI as a new tool for program-comprehension and software-engineering studies. Furthermore, we seek to refine our existing framework for conducting fMRI studies by extending it with eye tracking and improved control conditions. We describe how we will apply our upgraded framework to extend our understanding of program comprehension. In the long-run, we would like to contribute insights from our fMRI studies into software-engineering practices by providing code-styling guidelines and programming tools, which reduce the required cognitive effort to comprehend code.
Feature interactions occur when a feature behavior is influenced by the presence of another feature(s). Typically, interactions may lead to faults that are not easily identified from the analysis of each feature separately, especially when feature specifications are missing. In this paper, we propose VarXplorer, an iterative approach that supports developers to detect internal interactions on control and data flow of configurable systems, by means of feature-interaction graphs and an interaction specification language.
Software developers today crave for feedback, be it from their peers or even bots in the form of code review, static analysis tools like their compiler, or the local or remote execution of their tests in the Continuous Integration (CI) environment. With the advent of social coding sites like G
Commercial Cyber-physical System (CPS) development tools (e.g. MathWorks' Simulink) are widely used to design, simulate and automatically generate artifacts which are deployed in safety-critical embedded hardware. CyFuzz, the state-of-the-art CPS tool chain testing scheme is inefficient, cannot generate feature-rich inputs and is ineffective in finding new tool chain bugs. To better understand various properties of publicly available CPS models, we conducted the first large-scale study of 391 publicly-available Simulink models. Next, we proposed an efficient CPS model-generation scheme capable of creating large, feature-rich random inputs. Our tool realization for testing Simulink which found 8 new confirmed bugs, along with the study-artifacts are publicly available.
An abundance of data in many disciplines of science, engineering, national security, health care, and business has led to the emerging field of Big Data Analytics that run in a cloud computing environment. To process massive quantities of data in the cloud, developers leverage Data-Intensive Scalable Computing (DISC) systems such as Google's MapReduce, Hadoop, and Spark.
Currently, developers do not have easy means to debug DISC applications. The use of cloud computing makes application development feel more like batch jobs and the nature of debugging is therefore post-mortem. Developers of big data applications write code that implements a data processing pipeline and test it on their local workstation with a small sample data, downloaded from a TB-scale data warehouse. They cross fingers and hope that the program works in the expensive production cloud. When a job fails or they get a suspicious result, data scientists spend hours guessing at the source of the error, digging through post-mortem logs. In such cases, the data scientists may want to pinpoint the root cause of errors by investigating a subset of corresponding input records.
The vision of my work is to provide interactive, real-time and automated debugging services for big data processing programs in modern DISC systems with minimum performance impact. My work investigates the following research questions in the context of big data analytics: (1) What are the necessary debugging primitives for interactive big data processing? (2) What scalable fault localization algorithms are needed to help the user to localize and characterize the root causes of errors? (3) How can we improve testing efficiency during iterative development of DISC applications by reasoning the semantics of dataflow operators and user-defined functions used inside dataflow operators in tandem?
To answer these questions, we synthesize and innovate ideas from software engineering, big data systems, and program analysis, and coordinate innovations across the software stack from the user-facing API all the way down to the systems infrastructure.
Deadlock is among the most complex problems affecting the reliability of programs containing multiple, asynchronous threads. When undetected, deadlocks can lead to permanent thread blockage. Current detection methods are typically based on timeout and rollback of computations, resulting in significant delays. This paper presents Deadlock Detector and Solver (DDS), which can quickly detect and resolve circular deadlocks in Java programs. DDS uses a supervisory controller, which monitors program execution and automatically detects deadlocks resulting from hold-and-wait cycles on monitor locks. When a deadlock is detected, DDS uses a preemptive strategy to break the deadlock. Based on our experiments, DDS can in fact resolve deadlocks without significant run-time overhead.
This paper introduces a new approach to the automation of real-time embedded systems modeling. Our approach is based on a new domain-specific language called AutoModel to specify the requirements of a system in terms of its components, goals and constraints. Our automated approach accepts the specified requirements and infers both structural and behavioral models to implement the requirements in the UML-RT modelling language. Existing modeling tools can then be used to generate an implementation from the inferred models without extra work.
In practice, many organizations rely on cloning to implement customer-specific variants of a system. While this approach can have several disadvantages, organizations fear to extract reusable features later on, due to the corresponding efforts and risks. A particularly challenging and poorly supported task is to decide which features to extract. To tackle this problem, we aim to develop a recommender system that proposes suitable features based on automated analyses of the cloned legacy systems. In this paper, we sketch this recommender and its empirically derived metrics, which comprise cohesion, impact, and costs of features as well as the users' previous decisions. Overall, we will facilitate the adoption of systematic reuse based on an integrated platform.
Software testing is a crucial part of the software development process, but is often extremely time consuming, expensive, manual and error prone. This has resulted in a crucial need for test automation and acceleration. We propose using GPUs for the acceleration of test execution, by running individual functional tests in parallel on the GPU threads. We provide a fully automatic framework, called ParTeCL, which generates GPU code from sequential programs and executes their tests in parallel on the GPU. Current evaluation on 9 programs from the EEMBC industry standard benchmark suite show that ParTeCL achieves an average speedup of 16X when compared to a single CPU for these benchmarks.
Contemporary software development is characterized by increased reuse and speed. Open source software forges such as G
Message Passing Interface (MPI) has become the standard programming paradigm in high performance computing. It is challenging to verify MPI programs because of high parallelism and non-determinism. This paper presents MPI symbolic verifier (MPI-SV), the first symbolic execution based tool for verifying MPI programs having both blocking and non-blocking operations. MPI-SV exploits symbolic execution to automatically generate path-level models, and performs model checking on the models w.r.t. expected properties. The results of model checking are leveraged to prune redundant paths. We have implemented MPI-SV and the extensive evaluation demonstrates MPI-SV's effectiveness and efficiency.
While automatic text summarization has been widely studied for more than fifty years, in software engineering, automatic summarization is an emerging area that shows great potential and poses new and exciting research challenges. This technical briefing provides an introduction to the state of the art and maps future research directions in automatic software summarization. A first version was presented at ICSE'17 and now it is updated and enhanced, based on feedback from the audience.
This tutorial presentation describes the state-of-the-art in the emerging area of continuous testing in a DevOps context. It specifies the building blocks of a strategy for continuous testing in industrial-grade DevOps projects (iDevOps) and shares our motivations, achievements, and experiences on our journey to transform testing into the iDevOps world.
Experimentation is a key issue in science and engineering. But it is one of software engineering's stumbling blocks. Quite a lot of experiments are run nowadays, but it is a risky business. Software engineering has some special features, leading to some experimentation issues being conceived of differently than in other disciplines. The aim of this technical briefing is to help participants to avoid common pitfalls when analyzing the results of software engineering experiments. The technical briefing is not intended as a data analysis course, because there is already plenty of literature on this subject. It reviews several key issues that we have identified in published software engineering experiments, and addresses them based on the knowledge acquired after 19 years running experiments.
Automated manipulation of natural language requirements, for classification, tracing, defect detection, information extraction, and other tasks, has been pursued by requirements engineering (RE) researchers for more than two decades. Recent technological advancements in natural language processing (NLP) have made it possible to apply this research more widely within industrial settings. This technical briefing targets researchers and practitioners, and aims to give an overview of what NLP can do today for RE problems, and what could do if specific research challenges, also emerging from practical experiences, are addressed. The talk will: survey current research on applications of NLP to RE problems; present representative industrially-ready techniques, with a focus on defect detection and information extraction problems; present enabling technologies in NLP that can play a role in RE research, including distributional semantics representations; discuss criteria for evaluation of NLP techniques in the RE context; outline the main challenges for a systematic application of the techniques in industry. The crosscutting topics that will permeate the talk are the need for domain adaptation, and the essential role of the human-in-the-loop.
Two of the key challenges in software testing are the automated generation of test cases, and the identification of failures by checking test outputs. Both challenges are effectively addressed by metamorphic testing (MT), a software testing technique where failures are not revealed by checking an individual concrete output, but by checking the relations among the inputs and outputs of multiple executions of the software under test. Two decades after its introduction, MT is becoming a fully-fledged testing paradigm with successful applications in multiple domains including, among others, big data engineering, simulation and modeling, compilers, machine learning programs, autonomous cars and drones, and cybersecurity. This technical briefing will provide an introduction to MT from a double perspective. First, we will present the technique and the results of a novel survey outlining its main trends and lessons learned. Then, we will go deeper and present some of the successful applications of the technique, as well as challenges and opportunities on the topic. The briefing will be complemented with practical exercises on testing real web applications and APIs.
Git repositories are an important source of empirical software engineering product and process data. Running the Git command-line tool and processing its output with other Unix tools allows the incremental construction of sophisticated data processing pipelines. Git data analytics on the command-line can be systematically presented through a pattern that involves fetching, selection, processing, summarization, and reporting. For each part of the processing pipeline, we examine the tools and techniques that can be most effectively used to perform the task at hand. The presented techniques can be easily applied, first to get a feeling of version control repository data at hand and then also for extracting empirical results.
At the beginning of every research effort, researchers in empirical software engineering have to go through the processes of extracting data from raw data sources and transforming them to what their tools expect as inputs. This step is time consuming and error prone, while the produced artifacts (code, intermediate datasets) are usually not of scientific value. In the recent years, Apache Spark has emerged as a solid foundation for data science and has taken the big data analytics domain by storm. We believe that the primitives exposed by Apache Spark can help software engineering researchers create and share reproducible, high-performance data analysis pipelines.
In our technical briefing, we discuss how researchers can profit from Apache Spark, through a hands-on case study.
The traditional notion of malware is too narrow, and the prevalent characterizations (virus, worm, trojan horse, spyware etc.) are neither precise nor comprehensive enough to characterize cyber-physical malware (CPM). Detecting sophisticated CPM is like searching for a needle in the haystack without knowing what the needle looks like. This technical briefing congregates interdisciplinary knowledge to describe the fundamentals of CPM, the mathematical foundation for analyzing and verifying CPM, the current state-of-the-art, the challenges, and directions for future research. Employing real-world examples, we shall illustrate the challenges of analyzing and verifying CPM.
Code smells indicate the presence of quality problems that make the software hard to maintain and evolve. A software development team can keep their software maintainable by identifying smells and refactor them. In the first part of the session, we present a comprehensive overview of the literature concerning smells covering various dimensions of the metaphor including defining characteristics, classification, types, as well as causes and impacts of smells. In the second part, we delve into the details of smell detection methods prevailed currently both in research prototypes and industrial tools. The final part present actionable and pragmatic strategies for practitioners to avoid, detect, and eradicate smells from their codebase.
Machine Learning (ML) is the discipline that studies methods for automatically inferring models from data. Machine learning has been successfully applied in many areas of software engineering ranging from behaviour extraction, to testing, to bug fixing. Many more applications are yet be defined. However, a better understanding of ML methods, their assumptions and guarantees would help software engineers adopt and identify the appropriate methods for their desired applications. We argue that this choice can be guided by the models one seeks to infer. In this technical briefing, we review and reflect on the applications of ML for software engineering organised according to the models they produce and the methods they use. We introduce the principles of ML, give an overview of some key methods, and present examples of areas of software engineering benefiting from ML. We also discuss the open challenges for reaching the full potential of ML for software engineering and how ML can benefit from software engineering methods.
Software-intensive systems are increasingly pervading our everyday lives. As they get more and more connected, this opens them up to far-reaching cyber attacks. Moreover, a recent study by the U.S. Department of Homeland Security shows that more than 90% of current cyber-attacks are enabled not by faulty crypto, networks or hardware but by application-level implementation vulnerabilities. I argue that those problems can only be resolved by the widespread introduction of a secure software development lifecycle (SDLC). In this technical briefing I explain where secure engineering currently fails in practice, and what software engineers can do if they want to make a positive impact in the field. I will do so by explaining major open challenges in the field, but also by resorting to success stories from the introduction of SDLCs in industry.
As the Internet of Things becomes commonplace, modern software must encompass the sensors, actuators and controllers that make up these physical computers. But can non-experts program such systems? Can such software development be undertaken by anyone, especially programmers who are learning or who are not aiming to be technical experts? We describe the motivation and principles behind Microsoft MakeCode and CODAL, two symbiotic frameworks which have many innovative engineering features for physical computing. Together, these two technologies highlight a new approach to software development for embedded computing devices which provides accessible programming languages and environments that reduce the complexity of programming embedded devices without compromising the flexibility or performance of the resulting software.