|Data is critical for scientific research and engineering systems. However, data collection procedures are often subject to high cost or heavy loss rate. It is challenging to accurately estimate missing or unobserved data points through the available ones. To cope with this challenge, data interpolation methods have been utilized to approximate the missing data with lower price.
In this thesis, we study three specific problems on missing data interpolation in computer networking. They are 1) Autonomous System (AS) level path inference problem, 2) environment reconstruction problem in Wireless Sensor Networks (WSNs) and 3) rating of network paths inference problem.
For the first problem, we bring a new angle to the AS path inference by exploiting the metrical tree-likeness or the low hyperbolicity of the Internet, part of the complex network properties of the Internet. We show that such property can generate a new constraint that narrows down the searching space of possible AS paths to a much smaller size. Based on this observation, we propose two new algorithms, namely HyperPath and Valley-free HyperPath. With intensive evaluations on AS paths from real-world BGP Routing Information Bases, we show that the proposed algorithms can achieve better performance. We demonstrate that our algorithms can significantly reduce inter-AS traffic for P2P applications with an improved AS path prediction accuracy.
For the second problem, we propose a new approach, namely Probabilistic Model Enhanced Spatio-Temporal Compressive Sensing (PMEST-CS), to boost the performance of CS-based methods for environment reconstruction in WSNs. During algorithm design, we consider two new perspectives, which are exploiting the sparsity in spatio-temporal difference in environment and using probabilistic model and inference to enrich the available dataset. Experimental results utilizing the two real-world datasets show that significant performance gains, in terms of reconstruction quality, can be obtained in comparison with the state of the art CS-based methods.
For the third problem, we investigate the rating of network paths, which is not only informative but also cheap to obtain. We firstly address the scalable acquisition of path ratings by statistical inference. By observing similarities to recommender systems, we examine the applicability of solutions to recommender system and show that our inference problem can be solved by a class of matrix factorization techniques. Then, we investigate the usability of rating-based network measurement and inference in applications. A case study is performed on whether locality awareness can be achieved for overlay networks of Pastry and BitTorrent using inferred ratings. We show that such coarse-grained knowledge can improve the performance of peer selection and that finer granularities do not always lead to