Wasserstein Distance on Finite Spaces: Statistical Inference and Algorithms
by Max Sommerfeld
Date of Examination:2017-10-18
Date of issue:2017-12-07
Advisor:Prof. Dr. Axel Munk
Referee:Prof. Dr. Axel Munk
Referee:Prof. Dr. Stephan Huckemann
Files in this item
Name:DissertationSommerfeldRev.pdf
Size:3.83Mb
Format:PDF
Abstract
English
Wasserstein distances or, more generally, distances that quantify the optimal transport between probability measures on metric spaces have long been established as an important tool in probability theory. More recently, it has found its way into statistical theory, applications and machine learning - not only as a theoretical tool but also as a quantity of interest in its own right. Examples include goodness-of-fit, two-sample and equivalence testing, classification and clustering, exploratory data analysis using Fr ́echet means and geodesics in the Wasserstein metric. This advent of the Wasserstein distance as a statistical tool manifests two major challenges. First, knowledge on the theoretical properties of empirical, i.e. sample-based, Wasserstein distances remains incomplete, in particular as far as distributional limits on spaces other than the real line are concerned. Second, any application of the Wasserstein distance invokes massive computational challenges, leaving many practically interesting problems outside of the scope of available algorithms. The main thesis of this work is that restricting ourselves to the Wasserstein distance on finite spaces offers a perspective that is able to solve or at least avoid these problems and is still general enough to include many practical problems. Indeed, this work will present comprehensive distributional limits for empirical Wasserstein distances on finite spaces, strategies to apply these limits with controllable computational burden in large-scale in- ference and a fast probabilistic approximation scheme for optimal transport distances.
Keywords: Wasserstein distance; Optimal transport; Distributional limits; Fast approximation