Finite Alphabet Blind Separation
by Merle Behr
Date of Examination:2017-12-06
Date of issue:2018-01-04
Advisor:Prof. Dr. Axel Munk
Referee:Prof. Dr. Axel Munk
Referee:Prof. Dr. Wardetzky Max
Files in this item
Name:Dissertation_Behr.pdf
Size:2.84Mb
Format:PDF
Abstract
English
This thesis considers a particular blind source separation problem, where the sources are assumed to only take values in a known finite set, denoted as the alphabet. More precisely, one observes M linear mixtures of m signals (sources) taking values in the known finite alphabet. The aim in this model is to identify the unknown mixing weights and sources, including the number of sources, from noisy observations of the mixture. Finite Alphabet Blind Separation (FABS) occurs in many applications, for instance in digital communications with mixtures of multilevel pulse amplitude modulated digital signals. The main motivation for this thesis, however, comes from cancer genetics, where one aims to infer copy number aberrations of different clones in a tumor. In the first part of this thesis, we provide necessary and sufficient identifiability conditions and obtain exact recovery within a neighborhood of the mixture. In the second part, we study FABS for single mixtures M=1 within a change-point regression setting with Gaussian error. We provide uniformly honest lower confidence bounds and estimators with exponential convergence rates for the number of source components. With this at hand, we obtain consistent estimators with optimal convergence rates (up to log-factors) and asymptotically uniform honest confidence statements for the weights and the sources. We explore our procedure with a data example from cancer genetics. In the third part, we consider multivariate FABS, where several mixtures M > 1 are observed. For Gaussian error we show that the least squares estimator (LSE) attains the minimax rates, both for the prediction and for the estimation error. As computation of the LSE is not feasible, an efficient algorithm is proposed. Simulations suggest that this approximates the LSE well.
Keywords: multiscale inference, signal recovery, confidence, model selection, inference in restricted parameter spaces, dynamic programming, genetic sequencing