Artificial Intelligence is becoming increasingly popular with organizations due to the success of Machine Learning and Deep Learning techniques. Using these techniques, data scientists learn from vast amounts of data to enhance behaviour in software-intensive systems. Despite the attractiveness of these techniques, however, there is a lack of systematic and structured design process for developing ML/DL models. The study uses a multiple-case study approach to explore the different activities and challenges data scientists face when developing ML/DL models in software-intensive embedded systems. In addition, we have identified seven different phases in the proposed design process leading to effective model development based on the case study. Iterations identified between phases and events which trigger these iterations optimize the design process for ML/DL models. Lessons learned from this study allow data scientists and engineers to develop high-performance ML/DL models and also bridge the gap between high demand and low supply of data scientists.
In the field of safety-critical systems, manual reviews are important to ensure high-quality software and to satisfy legal obligations. When applying model-based engineering approaches, no longer are only textual requirements specifications or software code under review, but also model-based specification artifacts like behavioral requirements models. As such behavioral specifications are typically documented on a type-level, errors concerning the interactions between multiple system instances can go unnoticed in manual reviews. This is particularly the case when multiple system instances of the same system type are interacting during runtime, which is typical for cyber-physical systems where networks of cyber-physical systems form dynamically to fulfill an overall purpose. In this paper, we report on a controlled experiment whose results indicate that instance-level review diagrams have -- compared to type-level diagrams - important positive effects on reviewing processes for behavioral specifications of cyber-physical systems. Specifically, the experiment provides empirical evidence that instance-level review diagrams are significantly more expressive and effective than type-level diagrams.
Software quality has become the lever of differentiation in today's competitive marketplace. Quality at speed is the customer demand and automation is the biggest bottleneck holding the evolution of quality function. Increased levels of automation and intelligence in software engineering are the emerging trends across the IT field. As systems and software processes guide the life cycle activities and are the vehicles for building quality, it is necessary to look at the process infrastructure for the extent of process automation support provided and the digital enablement. This paper maps out the existing process infrastructure support in industry practice and proposes a roadmap for digital re-imagination of software and systems processes. Harmonizing the quality engineering themes with digital technologies, we propose a framework for building an intelligent software process infrastructure, iSPIN that can help in digital re-imagination of software and systems lifecycle processes. The framework has been implemented using digital technologies and has been piloted with one of the industry business unit for re-imagination of "proposal process". The proposed iSPIN framework will help in unprecedented automation and quality engineering at each process step and paves the way towards realizing the dictums of "Quality at Speed" and "Digital transformation of Software Process".
Textual user stories capture interactions of users with the system as high-level requirements. However, user stories are typically rather short and backlogs can include many stories. This makes it hard to (a) maintain user stories and backlogs, (b) fully understand the scope of a software project without a detailed analysis of the backlog, and (c) analyse how user stories impact design decisions during sprint planning and implementation. This paper proposes a technique to automatically transform textual user stories into visual use case scenarios in the form of robustness diagrams (a semi-formal scenario-based visualisation of workflows). In addition to creating diagrams for individual stories, the technique allows combining diagrams of multiple stories into one diagram to visualise workflows within sets of stories (e.g., a backlog). Moreover, the technique supports "viewpoint-based" diagrams, i.e., diagrams that show relationships between actors, domain entities and user interfaces starting from a diagram element (e.g., an actor) selected by the analyst. The technique utilises natural language processing and rule-based transformations. We evaluated the technique with more than 1,400 user stories from 22 backlogs and show that (a) the technique generates syntactically valid robustness diagrams, and (b) the quality of automatically generated robustness diagrams compares to the quality of diagrams created by human experts, but depends on the quality of the textual user stories.
Microservice architecture has been recognized as an important enabler for continuous development of many cloud-based systems. Code generation has been tried in the tool chain of building microservices. However, most existing tools generally do not consider the risks from continuous development.
We have been developing a toolkit which generates microservices from application domain models. Our approach aligns development process to this toolkit and coordinates domain modeling activity over project life cycles. In this paper, we describe its framework and corresponding development process which eliminates delays brought by the uncertainty of a project at a relatively early stage. Several minimum viable products have been built upon the proposed approach during the past years, including automated generation of code from domain decomposition. Our result shows 10% saving of effort and fewer issues. Effort saving increases to 30% under an extreme condition with high-rate personnel turnover. We also discuss our findings on running these projects and raise discussion and questions for future enhancement.
Large-scale system development companies are increasingly adopting agile methods. While this adoption may improve lead-times, such companies need to balance two trade-offs: (i) the need to have a uniform, consistent development method on system level with the need for specialised methods for teams in different disciplines (e.g., hardware, software, mechanics, sales, support); (ii) the need for comprehensive documentation on system level with the need to have lightweight documentation enabling iterative and agile work. With specialised methods for teams, isolated teams work within larger ecosystems of plan-driven culture, i.e., teams become agile "islands". At the boundaries, these teams share knowledge which needs to be managed well for a correct system to be developed. While it is useful to support diverse and specialised methods, it is important to understand which islands are repeatedly encountered, the reasons or factors triggering their existence, and how best to handle coordination between them. Based on a multiple case study, this work presents a catalogue of islands and the boundary objects between them. We believe this work will be beneficial to practitioners aiming to understand their ecosystems and researchers addressing communication and coordination challenges in large-scale development.
Selecting a suitable development method for a specific project context is one of the most challenging activities in process design. Every project is unique and, thus, many context factors have to be considered. Recent research took some initial steps towards statistically constructing hybrid development methods, yet, paid little attention to the peculiarities of context factors influencing method and practice selection. In this paper, we utilize exploratory factor analysis and logistic regression analysis to learn such context factors and to identify methods that are correlated with these factors. Our analysis is based on 829 data points from the HELENA dataset. We provide five base clusters of methods consisting of up to 10 methods that lay the foundation for devising hybrid development methods. The analysis of the five clusters using trained models reveals only a few context factors, e.g., project/product size and target application domain, that seem to significantly influence the selection of methods. An extended descriptive analysis of these practices in the context of the identified method clusters also suggests a consolidation of the relevant practice sets used in specific project contexts.
Human, social, organizational, and technical aspects are intertwined with each other in software teams during the software development process. Practices that teams actually adopt often deviate from those of the used frameworks, such as Scrum. However, currently there is little empirical insight explaining typical deviations, including their reasons and consequences. In this paper we use observations to investigate selected activities of the software development process in two companies that use Scrum. We study identified deviations to understand their reasons and consequences, using a survey and interviews. We identify 13 deviations and we categorize reasons based on type. The deviations' consequences are investigated in terms of their impact. Most deviations can be found in multiple teams. Reasons are doubts of the teams, organizational structures and complexity of the work. Consequences of deviations affect product development and team work.
Organizational factors such as team structure, coordination among engineers, or processes have a significant impact on software quality and development progress. Projects often take much longer to complete than planned and miscommunications among engineers are common. Yet, the process for exploring the project-specific or organization-specific root causes why this happens is still poorly supported. Investigations are cumbersome and require significant effort. In the context of this industrial case study, our industry partner was interested in measuring and assessing how the organization structure and issue handling processes ultimately affected software quality and time. Reducing the effort of such investigations/retrospectives and speeding up fact finding is important as it allows for more frequent, informed engineering process improvements and feedback to managers, team leads, and engineers. This paper describes our approach of pairing process metrics with visual historical inspection of issues. Stakeholders such as managers, team leads, or quality assurance engineers inspect metrics (and deviations from expected values) for individual issues and utilize a historical visualization of the affected (and related) issues to obtain insights into the reason for the metric (deviation) and its root cause. We demonstrate the usefulness of our approach based on our ProcessInspector prototype providing access to data on four real industry projects and a qualitative evaluation with team leads and group leads from our industry partner.
Software development teams dedicate considerable resources to training newcomers. Newcomers are new developers to a software project. The software onboarding process is more complicated than onboarding into other organizations. It is much more challenging and time-consuming. The role of a mentor in onboarding newcomers in software engineering is well understood. However, the disruptions to the work of an experienced developer can reduce the quality of their work and job satisfaction. We propose a conversational bot that can help onboard newcomers to a software project instead of an experienced programmer. The bot will act as a mentor for the newcomer, thus putting less stress on experienced programmers. The bot will also be able to scan outside sources, such as stack overflow, for solutions to issues a newcomer may face. The newcomer will be able to interact with the bot using natural language. We will use this bot to assess improvements to code quality in future studies.
Continuous experimentation (CE) refers to a group of practices used by software companies to rapidly assess the usage, value and performance of deployed software using data collected from customers and the deployed system. Despite its increasing popularity in the development of web-facing applications, CE has not been discussed in the development process of business-to-business (B2B) mission-critical systems.
We investigated in a case study the use of CE practices within several products, teams and areas inside Ericsson. By observing the CE practices of different teams, we were able to identify the key activities in four main areas and inductively derive an experimentation process, the HURRIER process, that addresses the deployment of experiments with customers in the B2B and with mission-critical systems. We illustrate this process with a case study in the development of a large mission-critical functionality in the Long Term Evolution (4G) product. In this case study, the HURRIER process is not only used to validate the value delivered by the solution but to increase the quality and the confidence from both the customers and the R&D organization in the deployed solution. Additionally, we discuss the challenges, opportunities and lessons learned from applying CE and the HURRIER process in B2B mission-critical systems.
In continuous integration (CI) environments, the program is rapidly and frequently modified and integrated. This feature introduces significant challenges to testing processes conducted in these environments. Based on existing technology, a test case that fails frequently is likely to fail in future tests. Therefore, the historical execution results of test cases are essential to guide the test case prioritization (TCP) in the CI environment. Reinforcement learning involves solving sequential decision-making problems and is suitable for TCP in the CI environment. At present, most of the TCP techniques based on reinforcement learning rely on the current cycle historical failure information of test cases. They rarely consider more historical cycle information, as well as other influencing factors. In this paper, we discussed the occurrence frequency of test cases for the first time. We also considered all historical information of each test case and proposed three new reward function, which employs the percentage of historical failure and the failure distribution of test cases, which can guide the reinforcement learning process. We evaluate our method on five industrial data sets. The experimental results show that our method can effectively prioritize test cases and improve the cost-effectiveness of the CI process.
Pull requests (PRs) selection is a challenging task faced by integrators in pull-based development (PbD), with hundreds of PRs submitted on a daily basis to large open-source projects. Managing these PRs manually consumes integrators' time and resources and may lead to delays in the acceptance, response, or rejection of PRs that can propose bug fixes or feature enhancements. On the one hand, well-known platforms for performing PbD, like GitHub, do not provide built-in recommendation mechanisms for facilitating the management of PRs. On the other hand, prior research on PRs recommendation has focused on the likelihood of either a PR being accepted or receive a response by the integrator. In this paper, we consider both those likelihoods, this to help integrators in the PRs selection process by suggesting to them the appropriate actions to undertake on each specific PR. To this aim, we propose an approach, called CARTESIAN (aCceptance And Response classificaTion-based requESt IdentificAtioN) modeling the PRs recommendation according to PR actions. In particular, CARTESIAN is able to recommend three types of PR actions: accept, respond, and reject. We evaluated CARTESIAN on the PRs of 19 popular GitHub projects. The results of our study demonstrate that our approach can identify PR actions with an average precision and recall of about 86%. Moreover, our findings also highlight that CARTESIAN outperforms the results of two baseline approaches in the task of PRs selection.
Integrating machine learning components in software systems is a task more and more companies are confronted with. However, there is not much knowledge today on how the software development process needs to change, when such components are integrated into a software system. We performed an interview study with 16 participants, focusing on emerging and changing task. The results uncover a set of 25 tasks associated to different software development phases, such as requirements engineering or deployment. We are just starting to understand the implications of using machine-learning components on the software development process. This study allows some first insights into how widespread the required process changes are.
Background: Software Process Simulation Modeling (SPSM) is of paramount importance to support quantitative management of software development process. Hybrid process simulation combines multiple simulation paradigms to reflect complex changes in realistic software processes, which brings inherent challenges to process management. Constructing a hybrid model requires more modeling expertise and experience than modeling by solo-paradigm. However, a few studies explicitly discuss the challenges they encountered as a topic, which may discourage practitioners. Objective: Our aim in this study is to present an industrial process modeling project as an exemplar to demonstrate and discuss the technical issues and challenges associated with hybrid process simulation in practice. Method: Based on the collaboration with a global software enterprise, we constructed a hybrid process simulation model that combines System Dynamics (SD) and Discrete Event Simulation (DES) to predict the project duration and release date for management. Results: Several challenges around hybrid process simulation of software development process are identified and discussed with the proposal of sets of solutions from different perspectives. The model is validated by comparing the simulation result with the actual enactment of the process in industry. In addition, the result confirms the rationality and efficacy of the suggested solutions to some extent. Conclusions: In the collaboration with the enterprise, five-step modeling procedure was adopted for constructing the hybrid process model. The experience reported about the detailed steps of hybrid modeling may offer reference value to the SPSM community.
Agile software development methods promise shorter time-to-market and higher product quality, but lack the ability of long-term planning or coping with large projects. However, software companies often also want the ability of long-term planning, promised by traditional or plan-based methods. To benefit from the strengths of both approaches, software companies often use a combination of agile and plan-based methods, known as hybrid development approaches. These approaches strongly depend on the individual context and are customized. Therefore, companies have to organize their hybrid development approach individually. However, practitioners often have difficulties with the organization of hybrid approaches. The organization considers how the phases, activities, roles, and artifacts are arranged and connected. Research lacks the necessary detailed insight into how hybrid development approaches are organized to support practitioners. To gain better understanding of the organization of hybrid approaches, we conducted a systematic literature review to gather descriptions of hybrid approaches. We analyzed the found papers thoroughly and could identify three general patterns of how hybrid approaches are organized. We found that all these patterns are still based on Royce's waterfall model and use the standard software engineering activities. Our findings shall help to lead further research and help practitioners to better organize their individual development approach.
We report on a new approach to co-creating adaptive case management systems jointly with end-users, developed in the context of the Effective co-created and compliant adaptive case Management Systems for Knowledge Workers (EcoKnow.org) research project. The approach is based on knowledge from prior ethnographic field studies and research in the declarative Dynamic Condition Response (DCR) technology for model-driven design of case management systems. The approach was tested in an operational environment jointly with the danish municipality of Syddjurs by conducting a service-design project and implementing an open source case manager tool and a new highlighter tool for mapping between textual specifications and the DCR notation. The design method and technologies were evaluated by understandability studies with end-users. The study showed that the development could be done in just 6 months, and that the new highlighter tool in combination with the traditional design and simulation tools, supports domain experts formalise and provide traceability between their interpretations of textual specifications and the formal models.
The collection of high-quality data provides a key competitive advantage to companies in their decision-making process. It helps to understand customer behavior and enables the usage and deployment of new technologies based on machine learning. However, the process from collecting the data, to clean and process it to be used by data scientists and applications is often manual, non-optimized and error-prone. This increases the time that the data takes to deliver value for the business. To reduce this time companies are looking into automation and validation of the data processes. Data processes are the operational side of data analytic workflow.
DataOps, a recently coined term by data scientists, data analysts and data engineers refer to a general process aimed to shorten the end-to-end data analytic life-cycle time by introducing automation in the data collection, validation, and verification process. Despite its increasing popularity among practitioners, research on this topic has been limited and does not provide a clear definition for the term or how a data analytic process evolves from ad-hoc data collection to fully automated data analytics as envisioned by DataOps.
This research provides three main contributions. First, utilizing multi-vocal literature we provide a definition and a scope for the general process referred to as DataOps. Second, based on a case study with a large mobile telecommunication organization, we analyze how multiple data analytic teams evolve their infrastructure and processes towards DataOps. Also, we provide a stairway showing the different stages of the evolution process. With this evolution model, companies can identify the stage which they belong to and also, can try to move to the next stage by overcoming the challenges they encounter in the current stage.
Development and Operations (DevOps), a particular type of Continuous Software Engineering, has become a popular Software System Engineering paradigm. Software architecture is critical in succeeding with DevOps. However, there is little evidence-based knowledge of how software systems are architected in the industry to enable and support DevOps. Since architectural decisions, along with their rationales and implications, are very important in the architecting process, we performed an industrial case study that has empirically identified and synthesized the key architectural decisions considered essential to DevOps transformation by two software development teams. Our study also reveals that apart from the chosen architecture style, DevOps works best with modular architectures. In addition, we found that the performance of the studied teams can improve in DevOps if operations specialists are added to the teams to perform the operations tasks that require advanced expertise. Finally, investment in testing is inevitable for the teams if they want to release software changes faster.
Agile methods were proposed to address the problems of traditional or plan-based software development, e.g., late customer feedback or resistance to change. However, unlike plan-based methods, they are not designed for long-term planning or to cope with large projects. Software companies want the ability of a fast reaction to changes but also the ability of long-term planning. To profit from the strength of both approaches, software companies often use a combination of agile and plan-based methods, called hybrid development approaches. These approaches depend on the respective context of each company. Therefore, the companies have to properly arrange and connect the phases, activities, roles, and artifacts from plan-based and agile approaches individually in their hybrid development approach. This is considered as the organization of hybrid development approaches. However, the organization of hybrid approaches is often difficult for the companies. Until now, research considers only the chosen development methods without the organization of hybrid development approaches. With my work, I want to strengthen the understanding of how hybrid approaches are organized and get a detailed picture of the challenges when organizing hybrid approaches. Based on my findings, I want to develop measures to support practitioners while organizing their development approach.
Context: Following on other scientific disciplines, such as health sciences, the use of grey literature (GL) is becoming widespread in Software Engineering (SE) research. Whilst the number of papers incorporating GL on SE is increasing, there is little empirically known about different aspects of the use of GL in SE research. In particular, there is a lack of excellent evaluation standard for the quality of GL. Aim: Our research is aimed at systematically reviewing the use of GL in SE, empirically exploring SE researchers' views on GL and providing a guide for using GL in SE and for quality assessment of the GL to be included. Method: We used a mixed-methods approach for this research. We carried out a Systematic Literature Review (SLR) of the use of GL in SE. Then we surveyed the authors of the papers included in the SLR (as GL users) and the invited experts in the SE community on the use of GL in SE research. Results: We systematically selected and reviewed 102 SE secondary studies that incorporate GL in SE research, from which we identified two groups based on their reporting: 1) 76 reviews only claim their use of GL; 2) 26 reviews report the results by including GL. We also obtained 20 replies from the GL users and 24 replies from the invited SE experts. Conclusion: There is no common understanding of the meaning of GL in SE. Researchers define the scopes and the definitions of GL in a variety of ways. We found five main reasons of using GL in SE research. The findings have enabled us to propose a conceptual model for how GL works in SE research lifecycle. In the next workThere is a need for research to develop guidelines for using GL in SE and for assessing quality of GL.