On monitoring and fault management of next generation networks

Shi, Lei

by Lei Shi

Doctoral thesis

Date of Examination:2010-11-04

Date of issue:2010-11-16

Advisor:Prof. Dr. Xiaoming Fu

Referee:Prof. Dr. Dieter Hogrefe

Referee:Prof. Dr. Tilman Wolf

Referee:Prof. Dr. Jens Grabowski

Persistent Address: http://dx.doi.org/10.53846/goediss-2465

Files in this item

Name:shi.pdf

Size:1.72Mb

Format:PDF

Description:Dissertation

ViewOpen

The following license files are associated with this item:

Abstract

English

This thesis investigates monitoring and fault management of next generation networks, in particular in environments where the network nodes within an Autonomous System (AS) are centrally controlled and managed. Existing works on network monitoring and fault management are developed in an isolated and extremely complex manner where the management protocols, such as SNMP, IPFIX and PSAMP, are separately designed and deployed without an overall consideration. To address this issue we propose a systematic framework for network monitoring and fault management, which takes account of monitoring protocol, traffic matrix composition and derivation, system rebooting, dynamic fault management and related security issues.We propose a new network monitoring framework which exploits the extensibility of the Internet Protocol (IP), especially for IP Version 6 (IPv6), the fundamental building block for next generation networks. This is implemented by defining a new IPv6 hop-by-hop extension header. Messages with such header would be able to carry metrics related to node and links along the path when they traverse the network. This approach is augmented with a path-based intrinsic monitoring protocol, which can effectively associate SNMP-based MIB information to a network path within the AS domain.To deal with fault management, we propose a novel transient loop avoidance algorithm which exploits traffic matrix information for updating forwarding tables in an optimal order achieving minimal link overflow. For fault recovery we present an efficient network rebooting algorithm which utilizes a priori knowledge of network traffic demand to minimize the rebooting time of all nodes in the entire AS, while ensuring that only a designated portion of traffic volume is affected.The proposed network monitoring, fault management and recovery schemes are evaluated through extensive analysis and experiments. Results show that our monitoring approach only generates less than 5% of the traffic generated by traceroute at only around 12% of the time taken to retrieve information for a 16-node network, our fault management method can achieve zero transient loop with minimal link overflow, and our fault recovery scheme can significantly reduce rebooting time (86.78% lower that the traditional approach by rebooting network node one by one). These approaches, although not yet implemented in an operational network,would provide insights for the future designers and operators of next generation networks.

Keywords: Computer networks; network monitoring; fault management; path-based monitoring; traffic matrix; Computer networks; network monitoring; fault management; path-based monitoring; traffic matrix

Statistik