Distributed Anomaly Detection and Prevention for Virtual Platforms

Jehangiri, Ali Imran

dc.contributor.advisor	Yahyapour, Ramin Prof. Dr.
dc.contributor.author	Jehangiri, Ali Imran
dc.date.accessioned	2015-07-24T09:36:28Z
dc.date.available	2015-07-24T09:36:28Z
dc.date.issued	2015-07-24
dc.identifier.uri	http://hdl.handle.net/11858/00-1735-0000-0022-605F-2
dc.identifier.uri	http://dx.doi.org/10.53846/goediss-5190
dc.language.iso	eng	de
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject.ddc	510	de
dc.title	Distributed Anomaly Detection and Prevention for Virtual Platforms	de
dc.type	doctoralThesis	de
dc.contributor.referee	Yahyapour, Ramin Prof. Dr.
dc.date.examination	2015-07-17
dc.description.abstracteng	An increasing number of applications are being hosted on cloud based platforms. Cloud platforms are serving as a general computing facility and applications being hosted on these platforms range from simple multi-tier web applications to complex social networking, eCommerce and Big Data applications. High availability, performance and auto-scaling are key requirements of Cloud based applications. Cloud platforms serve these requirements using dynamic provisioning of resources in on-demand, multi-tenant fashion. A key challenge for cloud service providers is to ensure the Quality of Service (QoS), as a user / customer requires more explicit guarantees of QoS for provisioning of services. Cloud service performance problems can directly lead to extensive financial loses. Thus, control and verification of QoS become a vital concern for any production level deployment. Therefore, it is crucial to address performance as a managed objective. The success of cloud services depends critically on automated problem diagnostics and predictive analytics enabling organizations to manage their performance proactively. Moreover, effective and advance monitoring is equally important for performance management support in clouds. In this thesis, we explore the key techniques for developing monitoring and performance management systems to achieve robust cloud systems. At first, two case studies are presented as a motivation for the need of a scalable monitoring and analytics framework. It includes a case study on performance issues of a software service, which is hosted on a virtualized platform. In the second case study, cloud services are analyzed that are offered by a large IT service provider. A generalization of case studies forms the basis for the requirement specifications which are used for state-of-the-art analysis. Although, some solutions for particular challenges have already been provided, a scalable approach for performance problem diagnosis and prediction is still missing. For addressing this issue, a distributed scalable monitoring and analytics framework is presented in the first part of this thesis. We conducted a thorough analysis of technologies to be used by our framework. The framework makes use of existing monitoring and analytics technologies. However, we develop custom collectors to retrieve data non-intrusively from different layers of cloud. In addition, we develop the analytics subscriber and publisher components to retrieve service related events from different APIs and sends alerts to the SLA Management component for taking corrective measures. Further, we implemented an Open Cloud Computing Interface (OCCI) monitoring extension using OCCI Mixin mechanism. To deal with performance problem diagnosis, a novel distributed parallel approach for performance anomaly detection is presented. First all anomalous metrics are found from a distributed database of time-series for a particular window. For comparative analysis three light-weight statistical anomaly detection techniques are selected. We extend these techniques to work with MapReduce paradigm and assess and compare the methods in terms of precision, recall, execution time, speedup and scale up. Next, we correlate the anomalous metrics with the target SLO in order to locate the suspicious metrics. We implemented and evaluated our approach on a production Cloud encompassing Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) service models. Experimental results confirm that our approach is efficient and effective in capturing the metrics causing performance anomalies. Finally, we present the design and implementation of an online anomaly prediction system for cloud computing infrastructures. We further present an experimental evaluation of a set of anomaly prediction methods that aim at predicting upcoming periods of high utilization or poor performance with enough lead time to enable the appropriate scheduling, scaling, and migration of virtual resources. Using real data sets gathered from Cloud platforms of a university data center, we compare several approaches ranging from time-series (e.g. auto regression (AR)) to statistical classification methods (e.g. Bayesian classifier). We observe that linear time-series models, especially AR models, are most likely suitable to model QoS measures and forecast their future values. Moreover, linear time-series models can be integrated with Machine Learning (ML) methods to improve proactive QoS management.	de
dc.contributor.coReferee	Tchernykh, Andrei Prof. Dr.
dc.contributor.thirdReferee	Damm, Carsten Prof. Dr.
dc.contributor.thirdReferee	Fu, Xiaoming Prof. Dr.
dc.contributor.thirdReferee	Hogrefe, Dieter Prof. Dr.
dc.contributor.thirdReferee	Kurth, Winfried Prof. Dr.
dc.subject.eng	Cloud+Monitoring+Analytics+Performance+Diagnosis+Prediction+root cause+time series+Machine Learning+Big data	de
dc.identifier.urn	urn:nbn:de:gbv:7-11858/00-1735-0000-0022-605F-2-8
dc.affiliation.institute	Fakultät für Mathematik und Informatik	de
dc.subject.gokfull	Informatik (PPN619939052)	de
dc.identifier.ppn	832021423

Dateien

Name:thesis1.pdf

Größe:2.031Mb

Format:PDF

Öffnen

Name:: thesis1.pdf
Größe:: 2.031Mb
Format:: PDF

Öffnen

Das Dokument erscheint in:

Fakultät für Mathematik und Informatik (inkl. GAUSS) [518]

Zur Kurzanzeige