Computational Complexity Meets Statistical Efficiency: From Change Point Estimation to Monotonicity Testing
Doctoral thesis
Date of Examination:2025-02-05
Date of issue:2025-04-04
Advisor:Dr. Housen Li
Referee:Dr. Housen Li
Referee:Prof. Dr. Claudia Steinem
Referee:Prof. Dr. Axel Munk
Referee:Prof. Dr. Gerlind Plonka-Hoch
Referee:Prof. Dr. Dominic Schuhmacher
Referee:Prof. Dr. Florin Manea
Sponsor:CRC 1456 and FOR 5381
Files in this item
Name:Dissertation_Zhi_Liu.pdf
Size:3.90Mb
Format:PDF
Abstract
English
In the Information Age, statistical efficiency and computational tractability have become equally important. In this dissertation, we propose two novel statistical procedures: the Fast and Optimal Monotonicity Test (FOMT) for the monotonicity testing problem and the Multiscale Quantile Segmentation Controlling Local Error (MUSCLE) for change point detection. FOMT employs a sparse collection of \emph{local} tests, strategically generated at random, to detect violations of monotonicity scattered throughout the domain of the regression function. This sparsity enables significant computational efficiency, achieving sublinear runtime in most cases, and quasilinear runtime (i.e., linear up to a log factor) in the worst case. In contrast, existing statistically optimal tests typically require at least quadratic runtime. FOMT's statistical accuracy is achieved through the precise calibration of these local tests and their effective combination, ensuring both sensitivity to violations and control over false positives. More precisely, we show that FOMT separates the null and alternative hypotheses at minimax optimal rates over Hölder function classes of smoothness order in $(0,2]$. Further, when the smoothness is unknown, we introduce an adaptive version of FOMT, based on the Lepskii principle, which attains statistical optimality and meanwhile maintains the same computational complexity as if the intrinsic smoothness were known. MUSCLE partitions serial data into multiple segments, each sharing a common quantile. It leverages multiple tests for quantile changes over different scales and locations, and variational estimation. Unlike the often adopted global error control, MUSCLE focuses on local errors defined on individual segments, significantly improving detection power in finding change points. Meanwhile, due to the built-in model complexity penalty, it enjoys the finite sample guarantee that its false discovery rate (or the expected proportion of falsely detected change points) is upper bounded by its unique tuning parameter. Further, we obtain the consistency and the localization error rates in estimating change points, under mild signal-to-noise-ratio conditions. Both match (up to log factors) the minimax optimality results in the Gaussian setup. All theories hold under the only distributional assumption of serial independence. Incorporating the wavelet tree data structure, we develop an efficient dynamic programming algorithm for computing MUSCLE.
Keywords: false discovery rate; minimax optimality; change points; multiscale method; robust segmentation; wavelet tree; adaptation; isotonic regression; fast computation; randomized algorithm