On monitoring and fault management of next generation networks
by Lei Shi
Date of Examination:2010-11-04
Date of issue:2010-11-16
Advisor:Prof. Dr. Xiaoming Fu
Referee:Prof. Dr. Dieter Hogrefe
Referee:Prof. Dr. Tilman Wolf
Referee:Prof. Dr. Jens Grabowski
Files in this item
Name:shi.pdf
Size:1.72Mb
Format:PDF
Description:Dissertation
Abstract
English
This thesis investigates monitoring and fault management of next generation networks, in particular in environments where the network nodes within an Autonomous System (AS) are centrally controlled and managed. Existing works on network monitoring and fault management are developed in an isolated and extremely complex manner where the management protocols, such as SNMP, IPFIX and PSAMP, are separately designed and deployed without an overall consideration. To address this issue we propose a systematic framework for network monitoring and fault management, which takes account of monitoring protocol, traffic matrix composition and derivation, system rebooting, dynamic fault management and related security issues.We propose a new network monitoring framework which exploits the extensibility of the Internet Protocol (IP), especially for IP Version 6 (IPv6), the fundamental building block for next generation networks. This is implemented by defining a new IPv6 hop-by-hop extension header. Messages with such header would be able to carry metrics related to node and links along the path when they traverse the network. This approach is augmented with a path-based intrinsic monitoring protocol, which can effectively associate SNMP-based MIB information to a network path within the AS domain.To deal with fault management, we propose a novel transient loop avoidance algorithm which exploits traffic matrix information for updating forwarding tables in an optimal order achieving minimal link overflow. For fault recovery we present an efficient network rebooting algorithm which utilizes a priori knowledge of network traffic demand to minimize the rebooting time of all nodes in the entire AS, while ensuring that only a designated portion of traffic volume is affected.The proposed network monitoring, fault management and recovery schemes are evaluated through extensive analysis and experiments. Results show that our monitoring approach only generates less than 5% of the traffic generated by traceroute at only around 12% of the time taken to retrieve information for a 16-node network, our fault management method can achieve zero transient loop with minimal link overflow, and our fault recovery scheme can significantly reduce rebooting time (86.78% lower that the traditional approach by rebooting network node one by one). These approaches, although not yet implemented in an operational network,would provide insights for the future designers and operators of next generation networks.
Keywords: Computer networks; network monitoring; fault management; path-based monitoring; traffic matrix; Computer networks; network monitoring; fault management; path-based monitoring; traffic matrix