State-of-the-art approaches for reporting performance analysis results rely on charts providing insights on the performance of the system, often organized in dashboards. The insights are usually data-driven, i.e., not directly connected to the performance concern leading the users to execute the performance engineering activity, thus limiting the understandability of the provided result. A cause is that the data is presented without further explanations.
To solve this problem, we propose a concern-driven approach for reporting of performance evaluation results, shaped around a performance concern stated by a stakeholder and captured by state-of-the-art declarative performance engineering specifications. Starting from the available performance analysis, the approach automatically generates a customized performance report providing a chart- and natural-language-based answer to the concern. In this paper, we introduce the general concept of concern-driven performance analysis reporting and present a first prototype implementation of the approach. We envision that, by applying our approach, reports tailored to user concerns reduce the effort to analyze performance evaluation results.
An overview of concerns observed in allowing for reproducibility in parallel applications that heavily depend on the three dimensional distributed memory fast Fourier transform are summarized. Suggestions for reproducibility categories for benchmark results are given.
Predicting performance-related events is an important part of proactive fault management. As a result, many approaches exist for the context of single systems. Surprisingly, despite its potential benefits, multi-system event prediction, i.e., using data from multiple, independent systems, has received less attention. We present ongoing work towards an approach for multi-system event prediction that works with limited data and can predict events for new systems. We present initial results showing the feasibility of our approach. Our preliminary evaluation is based on 20 days of continuous, preprocessed monitoring time series data of 90 independent systems. We created five multi-system machine learning models and compared them to the performance of single-system machine learning models. The results show promising prediction capabilities with accuracies and F1-scores over 90% and false-positive-rates below 10%.
Developers are usually unaware of the impact of code changes to the performance of software systems. Although developers can analyze the performance of a system by executing, for instance, a performance test to compare the performance of two consecutive versions of the system, changing from a programming task to a testing task would disrupt the development flow. In this paper, we propose the use of a city visualization that dynamically provides developers with a pervasive view of the continuous performance of a system. We use an immersive augmented reality device (Microsoft HoloLens) to display our visualization and extend the integrated development environment on a computer screen to use the physical space. We report on technical details of the design and implementation of our visualization tool, and discuss early feedback that we collected of its usability. Our investigation explores a new visual metaphor to support the exploration and analysis of possibly very large and multidimensional performance data. Our initial result indicates that the city metaphor can be adequate to analyze dynamic performance data on a large and non-trivial software system.
Many benchmarks have been proposed to measure the training/learning aspects of Artificial Intelligence systems. This is without doubt very important, because its methods are very computationally expensive, and, therefore, offering a wide variety of techniques to optimize the computational performance.The inference aspect of Artificial Intelligence systems is becoming increasingly important as the these system are starting to massive scale. However, there are no industry standards yet that measures the performance capabilities of massive scale AI deployments that must per-form very large number of complex inferences in parallel. In this work-in-progress paper we describe TPC-I, the industry's first benchmark to measure the performance characteristics of massive scale industry inference deployments. It models a representative use case, which enables hard- and software optimizations to directly benefit real customer scenarios.
Model-based approaches in Software Performance Engineering (SPE) are used in early design phases to evaluate performance. Most current model-based prediction approaches work quite well for single-core CPUs but are not suitable or precise enough for multicore environments. This is because they only consider a single metric (i.e., the CPU speed) as a factor affecting performance. Therefore, we investigate parallel-performance-influencing factors (PPIFs) as a preparing step to improve current performance prediction models by providing references curves for the speedup behaviour of different resource demands and scenarios. In this paper, we show initial results and their relevance for future work.
There are nearly one hundred parallel and distributed graph processing packages. Selecting the best package for a given problem is difficult; some packages require GPUs, some are optimized for distributed or shared memory, and some require proprietary compilers or perform better on different hardware. Furthermore, performance may vary wildly depending on the graph itself. This complexity makes selecting the optimal implementation manually infeasible. We develop an approach to predict the performance of parallel graph processing using both regression models and binary classification by labeling configurations as either well-performing or not. We demonstrate our approach on six graph processing packages: GraphMat, the Graph500, the Graph Algorithm Platform Benchmark Suite, GraphBIG, Galois, and PowerGraph and on four algorithms: PageRank, single-source shortest paths, triangle counting, and breadth first search. Given a graph, our method can estimate execution time or suggest an implementation and thread count expected to perform well. Our method correctly identifies well-performing configurations in 97% of test cases.
The complexity of modern applications makes it hard to fix memory leaks and other heap-related problems without tool support. Yet, most state-of-the-art tools share problems that still need to be tackled: (1) They group heap objects only based on their types, ignoring other properties such as allocation sites or data structure compositions. (2) Analyses strongly focus on a single point in time and do not show heap evolution over time. (3) Results are displayed in tables, even though more advanced visualization techniques may ease and improve the analysis.
In this paper, we present a novel visualization approach that addresses these shortcomings. Heap objects can be arbitrarily classified, enabling users to group objects based on their needs. Instead of inspecting the size of those object groups at a single point in time, our approach tracks the growth of each object group over time. This growth is then visualized using time-series charts, making it easy to identify suspicious object groups. A drill-down feature enables users to investigate these object groups in more detail.
Our approach has been integrated into AntTracks, a trace-based memory monitoring tool, to demonstrate its feasibility.
Performance evaluation is an integral part of computer architecture research. Rigorous performance evaluation is crucial in order to evaluate novel architectures, and is often carried out using benchmark suites. Each suite has a number of workloads with varying behavior and characteristics. Most analysis is done by analyzing the novel architecture across all workloads of a single benchmark suite. However, computer architects studying optimizations of specific microarchitectural components, require evaluation of their proposals on workloads that stress the component being optimized across multiple benchmark suites.
In this paper, we present the design and implementation of FAB - a framework built with Pin and Python based workflow. FAB allows user-driven analysis of benchmarks across multiple axes like instruction distributions, types of instructions etc. through an interactive Python interface to check for desired characteristics, across multiple benchmark suites. FAB aims to provide a toolkit that would allow computer architects to 1) select workloads with desired, user-specified behavior, and 2) create synthetic workloads with desired behavior that have a grounding in real benchmarks.
LINE is an open source library to analyze systems that can be modeled by means of queueing theory. Recently, a new major release of the tool (version 2.0.0) has introduced several novel features, which are the focus of this demonstration. These include, among others, an object-oriented modeling language aligned with the abstraction of the Java Modelling Tools (JMT) simulator and a set of native solvers based on state-of-the-art analytical and simulation-based solution paradigms.
It is mandatory to continuously assess software systems during development and operation, e.g., through testing and monitoring, to make sure that they meet their required level of performance. In our previous work, we have developed an approach to assess the degree to which configurations of a software system meet performance criteria based on a domain metric that is obtained by considering operational profiles and results from load test experiments. This paper presents our PPTAM tooling infrastructure that automates our approach and provides a dashboard visualization of the results.
Open source Big Data frameworks such as Spark have been evolvingquite rapidly. Many of the changes have addressed improvement inperformance mainly focusing on the performance of the entire job executing on a distributed system. Past studies have reported micro-architectural performance characteristics of benchmarks based onthese Big Data frameworks. Given the rapid changes to these frame-works, it is expected that some of these code changes will also have amicro-architectural impact. In this paper, we present a comparativestudy of performance of Apache Spark across two major revisions and demonstrate that there are micro-architectural differences in the way the applications use the hardware.
We present results of performing analytics and visualizations over micro-architectural performance metrics collected in simulation of high-end processor designs. These results contribute to several use-cases: Obtain fast alerts in cases of anomalous behavior of the design, create a global view of performance-related coverage, and compare different versions of the hardware model as an aid to identification of root-causes of performance differences and correlations between metrics. We show case our methods and results through experiments on a very-high-end processor design, and discuss how they are expected to affect the methodology of performance verification of next-generation designs from the vendor.
In modern days security systems often reach their performance peak and limit the protected application. Utilizing the available resources for security more efficiently is becoming more critical. In this paper, we introduce the claim, that no static security function chain is optimal in every situation. First experiments prove our claim.
Performance engineering researchers propose and employ various methods to analyze, model, optimize and manage the performance of modern distributed applications. In order to evaluate these methods in realistic scenarios, researchers rely on reference applications. Existing testing and benchmarking applications are usually difficult to set up and either outdated, designed for specific testing scenarios, or do not offer the necessary degrees of freedom.
In this paper, we present the TeaStore, a micro-service-based reference application. The TeaStore offers multiple services with various performance characteristics and a high degree of freedom regarding its deployment and configuration to be used as a cloud reference application for researchers. The TeaStore is designed for the evaluation of performance modeling and resource management techniques. We invite researchers to use the TeaStore and provide it open-source, extendable, easily deployable and monitorable.
This tutorial presents techniques for self-adaptive software systems that use performance models in order to achieve desired quality-of-service objectives. Main hindrances with the state of the art are the assumption of a steady-state regime to be able to use analytical solutions and the explosion of the state space which occurs when modeling software systems with stochastic processes such as Markov chains. This makes their online use difficult because the system under consideration may be in a transient regime, and the typically large cost of the analysis does not permit fast tracking of performance dynamics. We will introduce fluid models based on nonlinear ordinary differential equations as a key enabling technique to effectively approximate large-scale stochastic processes. This representation makes it possible to employ online optimization methods based on model-predictive control in order to find an assignment of the values of tunable parameters of the model steering the system toward a given performance goal. We will also show how, dually, the same techniques can be used for the online estimation of software service demands. In this tutorial we will focus on software performance models based on queuing networks, with applications to runtime auto-scaling in virtualized environments.
The continuing growth of the cloud computing market has led to an unprecedented diversity of cloud services with different performance characteristics. To support service selection, researchers and practitioners conduct cloud performance benchmarking by measuring and objectively comparing the performance of different providers and configurations (e.g., instance types in different data center regions). In this tutorial, we demonstrate how to write performance tests for IaaS clouds using the Web-based benchmarking tool Cloud WorkBench (CWB). We will motivate and introduce benchmarking of IaaS cloud in general, demonstrate the execution of a simple benchmark in a public cloud environment, summarize the CWB tool architecture, and interactively develop and deploy a more advanced benchmark together with the participants.
The successful development and marketing of commercial computer/communication systems requires the ability to quantify their performance and related metrics. Specifically, one should be able to demonstrate that projected customer requirements (QoS, QoE) are met, be able to identify bottlenecks, be able to evaluate and compare different configurations, and be able to evaluate and compare different designs. Performance engineering education should then train students to be able to carry out the above tasks. Exposure to three broad categories of approaches is necessary: Measurements aided by statistical techniques, analytic modeling and simulation. Both, the theory underlying these approaches and software packages that aid such analyses should be exposed. Besides failure-free performance, attention should also be devoted to reliability, availability, performability and survivability. In the current context, power consumption and security have gained importance as well. In this talk, we will take a journey through these issues.
In this extended abstract, the author highlights the various roles as a performance engineer in the industry. Based on his experience, some of the important tasks to perform in each role is listed. Also listed along with it a set of skills to be acquired for each role. It is hoped that these will help bridge the gap between academic course and industrial requirements in performance engineering in performance engineering. The structure of the presentation will closely follow that of this paper.
Model-based dependability analysis provides an effective manner to evaluate and design the dependability of critical IT systems by abstracting the system architecture and operations. As the size and the complexity of systems increase, however, the process to compose the dependability model becomes complicated and time-consuming. Improving the efficiency of modeling process is practically an important challenge of dependability engineering. In this paper, we review the techniques for model component reuse that makes dependability model composition and analysis more efficient. In particular, component-based modeling approaches for reliability, availability, maintainability and safety analysis presented in the literature are summarized. In order to effectively apply model component reuse, we advocate the importance of asset-based dependability analysis approach that associates the reusable model components with underlying system development process. Finally, we discuss the necessary extensions of these techniques toward efficient dependability analysis for IoT systems which are significantly affecting real world.
This paper presents experiences over thirteen years of teaching a queuing systems-based performance analysis course. We discuss how a 'mathematics first' approach resulted in students not retaining the intuitive concepts of queueing theory, which prompted us to redesign a course which would emphasize the 'common sense' principles of queuing theory as long-term takeaways. We present a sequence of syllabus topics that starts with developing and arriving at a host of queuing systems based insights and 'formulae' without going into the mathematics at all. Our key insight is that in practice, only asymptotic values - at both low and high load - are critical to (a) understand capacities of systems being studied and (2) basic sanity checking of performance measurement experiments. We also present two assignments (one measurement, and one simulation) that we now give, that help in reinforcing the practical applicability of queuing systems to modern server systems. While we do not have formal studies, anecdotally, we have reason to believe that this re-design has helped students retain for the long term, the most essential results of queuing systems, even if they do not study this subject further.
This talk summarizes some lessons from teaching a course in analytical performance modeling, specifically: (1) what can an analytical model offer? (2) how a model can be decomposed into submodels,so as to decouple different forces affecting performance, and thus analyze their interaction; (3) making choices in formulating a model; (4) the role of assumptions; (5) Average Value Approximation (AVA); (6) when bottleneck analysis suffices; (7) reducing the parameter space; (8) the concept of analytic validation; and (9) analysis with an analytical model.