Algorithms for Matching and Analysis Problems on Subsequences
Doctoral thesis
Date of Examination:2023-11-16
Date of issue:2024-10-14
Advisor:Prof. Dr. Florin Manea
Referee:Prof. Dr. Florin Manea
Referee:Prof. Dr. Carsten Damm
Referee:Prof. Dr. Dirk Nowotka
Files in this item
Name:ClassicThesis.pdf
Size:3.45Mb
Format:PDF
Abstract
English
The focus of this thesis is on the study of efficient algorithms and computational complexity results for matching and analysis problems on subsequences. The presented results comprise algorithmic solutions and lower bounds for four problems -- the matching, the equivalence, the containment, and the universality problem -- all of them considered for subsequences. In the classical setting, a subsequence of a string can be obtained by simply removing letters from the string. In the first chapter of the main part of this thesis, we discuss analysis problems for classical subsequences. Algorithmic results are presented for all considered problems. This includes the equivalence problem that is associated to Simon's congruence $\sim_k$. Two words are $\sim_k$-equivalent if they have the same set of subsequences of length $k$. In this thesis, an optimal linear time algorithm is presented for the MaxSimK problem which is to compute the largest $k$ such that the two words have the same set of length-$k$-subsequences. The presented algorithm uses a novel data structure, called Simon tree, that allows to connect two words with regard to their $\sim_k$ equivalence. In the second chapter of the main part of this thesis, a new setting for subsequences is discussed. Subsequences with gap constraints, or gapped subsequences, are length-$k$-subsequences that can be embedded into a string $w$ such that the gaps between the individual letters of the subsequences satisfy given gap constraints $gc$. The matching problem for classical subsequences is easily solvable in linear time. For subsequences with gap constraints it becomes much harder and we discuss a dynamic programming algorithm working in $O(|w| |gc|)$ time accompanied by a matching conditional lower bound that depends on the OV hypothesis. In the third chapter of the main part, we finally discuss efficient algorithms and conditional lower bounds for analysis problems on gapped subsequences. This comprises efficient algorithms for the equivalence, containment, and universality problem that are achieved through a brute-force approach. The algorithmic results are matched by conditional lower bounds that depend on ETH (exponential time hypothesis) and SETH (strong exponential time hypothesis), respectively.
Keywords: Simon's Congruence; Subsequence; Scattered factor; Efficient algorithms; String algorithms; Subsequences with gap constraints; Pattern matching; Fine-grained complexity; Conditional lower bounds; Parameterised complexity; Universality