Inference is the production stage of machine learning workflow in which a trained model is used to infer or predict with real world data. A recommendation system improves customer experience by displaying most relevant items based on historical behavior of a customer. Machine learning models built for recommendation systems are deployed either on-premise or migrated to a cloud for inference in real time or batch. A recommendation system should be cost effective while honoring service level agreements (SLAs). In this work we discuss on-premise implementation of our recommendation system called iPrescribe. We show a methodology to migrate on-premise implementation of recommendation system to a cloud using ML workflow. We also present our study on performance of recommendation system model when deployed on different types of virtual instances.
It is my great pleasure to welcome you to WOSP-C 2020, the Workshop on Challenges and Opportunities in Large Scale Performance. Our theme this year relates to the use of analytics to interpret system performance and resource usage measurements that can now be gathered rapidly on a large scale. Our four invited speakers hail from industry. All three presentations in the first session and the last presentation in the second session deal with modeling and measurement to automate the making of decisions about system configuration or the recognition of anomalies, especially for cloud-based systems. The other two papers in the second session address measurement and modeling issues at a granular level. These topics are highly relevant to the issues systems architects and other stakeholders face when deploying systems in the cloud, because doing so need not guarantee good performance. The recent emergence of the ability to gather vast numbers of performance and resource usage measurements facilitates the informed choice of target cloud platforms and their configurations. The presentations in this workshop deal with various aspects of how this can be achieved.
We begin by presenting a short overview of the classical Statistical Process Control based Anomaly Detection techniques and tools including Multivariate Adaptive Statistical Filtering, Statistical Exception Detection System, Exception Value meta-metric based Change Point Detection, control chart, business driven massive prediction and methods of using them to manage large-scale systems (with real examples of applying that to large financial companies) such as on-prem servers fleet, or massive clouds. Then we will turn to the presentation of modern techniques of anomaly and normality detection, such as deep learning and entropy-based anomalous pattern detections (also successfully tested against a large amount of real performance data of a large bank).
In this extended abstract, we provide an outline of the presentation planned for WOSP-C 2020. The goal of the presentation is to provide an overview of the challenges and approaches for automated scalability assessment in the context of DevOps and microservices. The focus of this presentation is on approaches that employ automated identification of performance problems because these approaches can leverage performance anti-pattern detection technology. In addition, we envision extending the approach to recommend component refactoring. In our previous work[1,2] we have designed a methodology and associated tool support for the automated scalability assessment of micro-service architectures, which included the automation of all the steps required for scalability assessment. The presentation starts with an introduction to dependability, operational Profile Data, and DevOps. Specifically, we provide an overview of the state of the art in continuous performance monitoring technologies that are used for obtaining operational profile data using APM tools. We then present an overview of selected approaches for production and performance testing based on the application monitoring tool (PPTAM) as introduced in [1,2]. The presentation concludes by outlining a vision for automated performance anti-pattern detection. Specifically, we present the approach introduced for automated anti-pattern detection based on load testing results and profiling introduced in and provide recommendations for future research.
This report is prompted by some recent experience with building performance models from kernel traces recorded by LTTng, a tracer that is part of Linux, and by observing other researchers who are analyzing performance issues directly from the traces. It briefly distinguishes the scope of the two approaches, regarding the model as an abstraction of the trace, and the model-building as a form of machine learning. For model building it then discusses how various limitations of the kernel trace information limit the model and its capabilities and how the limitations might be overcome by using additional information of different kinds. The overall perspective is a tradeoff between effort and model capability.
Organizations want to take advantage of the flexibility and scalability of Cloud platforms. By migrating to the Cloud, they hope to develop and implement new applications faster with lower cost. Amazon AWS, Microsoft Azure, Google, IBM, Oracle and others Cloud providers support different DBMS like Snowflake, Redshift, Teradata Vantage, and others. These platforms have different architectures, mechanisms of allocation and management of resources, and levels of sophistication of DBMS optimizers which affect performance, scalability and cost. As a result, the response time, CPU Service Time and the number of I/Os for the same query, accessing the similar table in the Cloud could be significantly different than On Prem. In order to select the appropriate Cloud platform as a first step we perform a Workload Characterization for On Prem Data Warehouse. Each Data Warehouse workload represents a specific line of business and includes activity of many users generating concurrently simple and complex queries accessing data from different tables. Each workload has different demands for resources and different Response Time and Throughput Service Level Goals. In this presentation we will review results of the workload characterization for an On Prem Data Warehouse environment. During the second step we collected measurement data for standard TPC-DS benchmark tests performed in AWS Vantage, Redshift and Snowflake Cloud platform for different sizes of the data sets and different number of concurrent users. During the third step we used the results of the workload characterization and measurement data collected during the benchmark to modify BEZNext On Prem Closed Queueing model to model individual Clouds. And finally, during the fourth step we used our Model to take into consideration differences in concurrency, priorities and resource allocation to different workloads. BEZNext optimization algorithms incorporating Graduate search mechanism are used to find the AWS instance type and minimum number of instances which will be required to meet SLGs for each of the workloads. Publicly available information about the cost of the different AWS instances is used to predict the cost of supporting workloads in the Cloud month by month during next 12 months.
Interesting approaches to counteract performance variability within cloud datacenters include sending multiple request clones, either immediately or after a specified waiting time. In this paper we present a performance model of cloud applications that utilize the latter concept, known as speculative execution. We study the popular Join-Shortest-Queue load-balancing strategy under the processor sharing queuing discipline. Utilizing the near-synchronized service property of this setting, we model speculative execution using a simplified synchronized service model. Our model is approximate, but accurate enough to be useful even for high utilization scenarios. Furthermore, the model is valid for any, possibly empirical, inter-arrival and service time distributions. We present preliminary simulation results, showing the promise of our proposed model.
Serverless computing is steadily becoming the implementation paradigm of choice for a variety of applications, from data analytics to web applications, as it addresses the main problems with serverfull and monolithic architecture. In particular, it abstracts away resource provisioning and infrastructure management, enabling developers to focus on the logic of the program instead of worrying about resource management which will be handled by cloud providers. In this paper, we consider a document processing system used in FinTech as a case study and describe the migration journey from a monolithic architecture to a serverless architecture. Our evaluation results show that the serverless implementation significantly improves performance while resulting in only a marginal increase in cost.
Serverless computing services, such as Function-as-a-Service (FaaS), hold the attractive promise of a high level of abstraction and high performance, combined with the minimization of operational logic. Several large ecosystems of serverless platforms, both open- and closed-source, aim to realize this promise. Consequently, a lucrative market has emerged. However, the performance trade-offs of these systems are not well-understood. Moreover, it is exactly the high level of abstraction and the opaqueness of the operational-side that make performance evaluation studies of serverless platforms challenging. Learning from the history of IT platforms, we argue that a benchmark for serverless platforms could help address this challenge. We envision a comprehensive serverless benchmark, which we contrast to the narrow focus of prior work in this area. We argue that a comprehensive benchmark will need to take into account more than just runtime overhead, and include notions of cost, realistic workloads, more (open-source) platforms, and cloud integrations. Finally, we show through preliminary real-world experiments how such a benchmark can help compare the performance overhead when running a serverless workload on state-of-the-art platforms.
Fog computing has been regarded as an ideal platform for distributed and diverse IoT applications. Fog environment consists of a network of fog nodes and IoT applications are composed of containerized microservices communicating with each other. Distribution and optimization of containerized IoT applications in the fog environment is a recent line of research. Our work took Kubernetes as an orchestrator that instantiates, manages, and terminates containers in multiple-host environments for IoT applications, where each host acts as a fog node. This paper demonstrates the industrial feasibility and practicality of deploying and managing containerized IoT applications on real devices (raspberry pis and PCs) by utilizing commercial software tools (Docker, WeaveNet). The demonstration will show that the application's functionality is not affected by the distribution of communicating microservices on different nodes.
General matrix-matrix multiplication (GEMM) is a critical operation in many application domains . It is a central building block of deep learning algorithms, computer graphics operations, and other linear algebra dominated applications. Due to this, GEMM has been extensively studied and optimized, resulting in libraries of exceptional quality such as BLAS, Eigen, and other platform specific implementations such as MKL (Intel) and ESSL (IBM) [2,3]. Despite these successes, the GeMM idiom continues to be re-implemented by programmers, without consideration for the intricacies already accounted for by the aforementioned libraries. To this end, this project aims to provide transparent adoption of high-performance implementations of GEMM through a novel optimization pass implemented within the LLVM framework using idiom recognition techniques. Sub-optimal implementations of GEMM are replaced by equivalent library calls.
Large-scale not-for-profit Internet Service Providers (ISPs), such as National Research and Education Networks (NRENs) often have significant amounts of underutilized bandwidth because they provision their network capacity for the rare event that all clients utilize their purchased bandwidth. However, traffic policers are still applied to enforce committed purchase rates and avoid congestion. We present the design and initial evaluation of an SDN/OpenFlow solution that maximizes the network link utilization by user-defined fair allocation of spare bandwidth, while guaranteeing minimum bandwidth for each client.
The growth of cloud services leads to more and more data centers that are increasingly larger and consume considerable amounts of power. To increase energy efficiency, both the actual server equipment and the software themselves must become more energy-efficient. It is the software that controls the hardware to a considerable degree. In this work-in-progress paper, we present a first analysis of how compiler optimizations can influence energy efficiency. We base our analysis on workloads of the SPEC CPU 2017 benchmark. With 43 benchmarks from different domains, including integer and floating-point heavy computations executed on a state-of-the-art server system for cloud applications, SPEC CPU 2017 offers a representative selection of workloads.
Software developers use collection data structures extensively and are often faced with the task of picking which collection to use. Choosing an inappropriate collection can have major negative impact on runtime performance. However, choosing the right collection can be difficult since developers are faced with many possibilities, which often appear functionally equivalent. One approach to assist developers in this decision-making process is to micro-benchmark data-structures in order to provide performance insights. In this paper, we present results from experiments on Java collections (maps, lists, and sets) using our tool JBrainy, which synthesises micro-benchmarks with sequences of random method calls. We compare our results to the results of a previous experiment on Java collections that uses a micro-benchmarking approach focused on single methods. Our results support previous results for lists, in that we found ArrayList to yield the best running time in 90% of our benchmarks. For sets, we found LinkedHashSet to yield the best performance in 78% of the benchmarks. In contrast to previous results, we found TreeMap and LinkedHashMap to yield better runtime performance than HashMap in 84% of cases.
Microservices and serverless functions are becoming integral parts of modern cloud-based applications. Tailored performance engineering is needed for assuring that the applications meet their requirements for quality attributes such as timeliness, resource efficiency, and elasticity. A novel DevOps-based framework for developing microservices and serverless applications is being developed in the RADON project. RADON contributes to performance engineering by including novel approaches for modeling, deployment optimization, testing, and runtime management. This paper summarizes the contents of our tutorial presented at the 11th ACM/SPEC International Conference on Performance Engineering (ICPE).
The proliferation of big data technology and faster computing systems led to pervasions of AI based solutions in our life. There is need to understand how to benchmark systems used to build AI based solutions that have a complex pipeline of pre-processing, statistical analysis, machine learning and deep learning on data to build prediction models. Solution architects, engineers and researchers may use open-source technology or proprietary systems based on desired performance requirements. The performance metrics may be data pre-processing time, model training time and model inference time. We do not see a single benchmark answering all questions of solution architects and researchers. This tutorial covers both practical and research questions on relevant Big Data and Analytics benchmarks.