SEiA '18- Proceedings of the 2018 International Conference on Software Engineering in Africa

Full Citation in the ACM Digital Library

SESSION: Natural language processing and semantic technologies

Consolidation of BI efforts in the LOD era for african context

During the last few years, we assist to spectacular increase of Business intelligence and analytics (BI&A) software revenue in the Middle East and Africa (MENA) totaled $245 million in 2014, a 12 percent increase from 2013 revenue of $219 million, according to Gartner, Inc. The Linked Open Data (LOD) era will contribute positively in keeping this dynamic. LOD datasets complete internal sources by new and relevant information for decision making. This integration in the data warehousing landscape has become a necessity. Recent studies conducted mainly in Europe have been proposed for this direction. Similarly, to first studies of conventional DW design, LOD driven approaches focused on data issues (like integration and multidimensionality) and ignored the importance of functional and non-functional requirements. This issue is a precondition for the success of BI&A projects in Africa. This continent is living an interesting phenomenon related to the multiplication of Open Data initiatives. Based on our experience on developing BI&A projects, our origin and knowing the European and African contexts, we propose a requirement-driven approach for designing semantic data warehouses from internal and LOD datasets, by considering requirements incrementally. All phases of our approach is formalized allowing a traceability of requirements. Experiments are conducted that show the impact of incorporating exploratory requirements of the target warehouse. A case study analyzing book sales transactions is given.

SESSION: Big data

A state-of-the-art review of machine learning techniques for fraud detection research

The area of fraud detection1 has been traditionally correlated with data mining and text mining. Even before the "big data" phenomena started in 2008, text mining and data mining were used as instruments of fraud detection. However, the limited technological capabilities of the pre-big data technologies made it very difficult for researchers to run fraud detection algorithms on large amounts of data. This paper reviews the existing research done in fraud detection across different areas with the aim of investigating the machine learning techniques used and find out their strengths and weaknesses. It used the systematic quantitative literature review methodology to review the research studies in the field of fraud detection research within the last decade using machine learning techniques. Various combinations of keywords were used to identify the pertinent articles and were retrieved from ACM Digital Library, IEEE Xplorer Digital Library, Science Direct, Springer Link, etc. This search used a sample of 80 relevant articles (peer-reviewed journals articles and conference papers). The most used machine learning techniques were identified, and their strengths and weaknesses. Finally, the conclusion, limitations and future work have been shown.

Big data: deep learning for detecting malware

Malicious software, commonly known as malware are constantly getting smarter with the capabilities of undergoing self-modifications. They are produced in big numbers and widely deployed very fast through the Internet-capable devices. This is therefore a big data problem and remains challenging in the research community. Existing detection methods should be enhanced in order to effectively deal with today's malware. In this paper, we propose a novel real-time monitoring, analysis and detection approach that is achieved by applying big data analytics and machine learning in the development of a general detection model. The learnings achieved through big data render machine learning more efficient. Using the deep learning approach, we designed and developed a scalable detection model that brings improvement to the existing solutions. Our experiments achieved an accuracy of 97% and ROC of 0.99.

Tracking food insecurity from tweets using data mining techniques

Data mining algorithms can be applied to extract useful patterns from social media conversations to monitor disasters such as tsunami, earth quakes and nuclear power accidents. While food insecurity has persistently remained a world concern, its monitoring with this strategy has received limited attention. In attempt to address this concern, UN Global Pulse demonstrated that tweets reporting food prices from Indonesians can aid in predicting actual food price increase. For regions like Kenya and Uganda where use of tweets is considered low, this option can be problematic. Using Uganda as a case study, this study takes an alternative of using tweets from all over the world with mentions of; (1) uganda +food, (2) uganda + hunger, and (3) uganda + famine for years 2014, 2015 and 2016. The study however utilized tweets on food insecurity instead of tweets on food prices. In the first step, five data mining algorithms (D-tree, SVM, KNN, Neural Networks and N-Bayes) were trained to identify tweets conversations on food insecurity. Algorithmic performance were found comparable with human labeled tweet on the same subject. In step two, tweets reporting food insecurity were generated into trends. Comparing with trends from Uganda Bureau of Statistics, promising findings have been obtained with correlation coefficients of 0.56 and 0.37 for years 2015 and 2016 respectively. The study provides a strategy to generate information about food insecurity for stakeholders such as World Food Program in Uganda for mitigation action or further investigation depending on the situation. To improve performance, future work can; (1) aggregate tweets with other datasets, (2) ensemble algorithms, and (3) apply unexplored algorithms.

SESSION: SE methodologies

Adapting lightweight user-centered design with the scrum-based development process

User-centered design (UCD)1 addresses the design of interactive systems placing the users in the center of the design with the aim of improving usability and user experience. Developing economies are in dear need of UCD; low IT literacy, low infrastructure and funds; and heterogeneity in culture and livelihood result in special requirements on usability in order to harvest the possible benefits of IT. Traditional UCD methods, however, are often regarded as heavy-weight and expensive. Agile software development methods are light-weight, flexible and iterative in order to accommodate the changing requirements and unsure funding and are therefore important for IT companies in developing economies. Can we adjust UCD methods to fit the need of developing economies and with agile development while taking advantage of the iterative character of agile development methods? The research appropriated an action research approach called Cooperative Method Development (CMD). Based on the empirical investigation, UCD challenges were identified, innovative use of light-weight UCD methods was deliberated and implemented. The improvements include: working with local IT personnel, light-weight and incremental use of personas, support departments performing acceptance testing on release versions, culturally adapted user testing in pairs and heuristic evaluation as adapted UCD practices. The evaluation together with the involved practitioners shows improvements in the development process including reduced reworks; satisfied users; better collaboration with stakeholders; and a close understanding of users and their needs. The evaluation of the resulting integrated approach with the involved practitioners as well as software engineers not involved in the research indicates that the results are transferable to similar contexts.

Partitioning microservices: a domain engineering approach

Architecture styles in the software world continue to evolve driven by the need to present easier and more appealing ways of designing and building software systems to meet stakeholder needs. One of the popular trends at the moment is microservices. Microservice architecture is gaining the market of software development architecture due to its capability to scale. It separates independent small services of a system to perform one business capability at a time. However, determining the right size of business capability that could be called a microservice is still a challenge. Current practices of partitioning microservice rely on personal practice within industry which is prone to bias by practitioners. Based on the ambiguity of determining the optimum size of a microservice, in this paper, we propose a conceptual methodology to partition a microservice based on domain engineering technique. Domain engineering identifies the information needed by a microservice, services needed for microservice functionality and provides description for workflows in the service. We demonstrate the usage of this methodology on the weather information dissemination domain as a confirmatory case study. We show how to split the weather information dissemination system sub-domain into different microservices that accomplish the weather information dissemination business capability.

Planning for public sector software projects using value-based requirements engineering techniques: a research agenda

The introduction of e-Government enabled services has resulted in the public- sector wide integration of different software applications, often scaled up to a national level. Out of observation, the way these initiatives are handled differs in the way software-development projects are managed in the private sector. The anticipated value of these projects tends to differ significantly in the long run. We have particularly picked interest in the health sector in which e-Health initiatives have been defined. We aim at understanding how value proliferation can be understood and quantified from the onset on such large-scale projects using requirement engineering techniques. In this work we infer that effective planning of large scale ICT initiatives, such as e-Health, should be long term driven so as to ensure effective sector management. Novel approaches in this realm should strive at linking strategy, measurement and operational decisions from the onset. In here we examine what has been done, key opportunities, challenges and gaps that can be addressed by the research community. In bridging these gaps, we propose an agenda by formulating key research questions which both the industry and academia can address as future direction to align this view.