Alignment-free Phylogenetic Placement and its Applications

Blanke, Matthias

dc.contributor.advisor	Morgenstern, Burkhard Prof. Dr.
dc.contributor.author	Blanke, Matthias
dc.date.accessioned	2023-03-03T17:20:03Z
dc.date.available	2023-03-10T00:50:10Z
dc.date.issued	2023-03-03
dc.identifier.uri	http://resolver.sub.uni-goettingen.de/purl?ediss-11858/14554
dc.identifier.uri	http://dx.doi.org/10.53846/goediss-9762
dc.format.extent	XXX Seiten	de
dc.language.iso	eng	de
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/
dc.subject.ddc	570	de
dc.title	Alignment-free Phylogenetic Placement and its Applications	de
dc.type	doctoralThesis	de
dc.contributor.referee	Morgenstern, Burkhard Prof. Dr.
dc.date.examination	2023-02-17	de
dc.description.abstracteng	The study of the evolutionary interrelations of living organisms has been at the heart of biological sciences all along. A revolution in sequencing techniques in the past decades has caused a massive increase in molecular sequence data. As a result, contemporary methods assess evolutionary relationships between organisms by quantifying the degree of similarity between their biological sequence data. The discovered relationships of phylogenetic studies are commonly represented and visualized by phylogenetic trees or networks. Traditionally, sequences have been extracted from single organisms; however, recent technological progress has enabled the retrieval of sequence data directly from environmental samples. In doing so, large numbers of short sequencing reads arise that may originate from all organisms present in the respective environment. One major subsequent objective is the taxonomic or phylogenetic identification of those sequencing reads. However, longstanding maximum-likelihood-based de-novo phylogeny reconstruction methods are limited in their applicability by their computational demands; typically, they cannot be applied when the available molecular sequences are present in great numbers or are of great length. Fortunately, phylogenetic placement offers a unique approach to identify large sets of query reads within their phylogenetic context by inserting them into an existing phylogenetic tree comprising a set of reference sequences. Here, we present a new alignment- and assembly-free approach to phylogenetic placement, the Alignment-free phylogenetic placement algorithm based on Spaced-word Matches (App-SpaM). App-SpaM extracts short, non-contiguous subwords to detect homologies between the query and reference sequences, a method known as the spaced-word matches approach. It counts the number of such words and utilizes them to infer the average number of nucleotide substitutions between each read and each reference sequence. Then, it uses fast heuristics to infer a suitable placement position within the reference tree. We assessed how App-SpaM compares to existing algorithms for phylogenetic placement with respect to accuracy and computation speed in a comprehensive evaluation. We demonstrate that App-SpaM is on par with maximum- likelihood-based algorithms on metataxonomic data sets. In addition, App-SpaM is two to three orders of magnitude faster than the next fastest programs while its memory demands stay low. We extensively discuss App-SpaM’s advantages and drawbacks and propose several additional features to improve upon its original version: For this, we evaluate a set of novel placement heuristics, the use of sampling techniques to allow an improved scalability with the length of the reference sequences, and a measure for the uncertainty of proposed placement positions. Subsequently, we present a variety of novel use cases of phylogenetic that are made uniquely possible by App-SpaM’s versatility with respect to its potential input data. These applications include, in particular, the iterative augmentation of existing species trees by means of phylogenetic placement and the screening for outlier genes or species prior to phylogeny reconstruction.	de
dc.contributor.coReferee	Söding, Johannes Dr.
dc.subject.eng	Phylogenetic Placement	de
dc.subject.eng	Metagenomics	de
dc.subject.eng	Metataxonomics	de
dc.subject.eng	Phylogenetics	de
dc.subject.eng	Alignment-free	de
dc.subject.eng	Spaced Words	de
dc.identifier.urn	urn:nbn:de:gbv:7-ediss-14554-3
dc.affiliation.institute	Biologische Fakultät für Biologie und Psychologie	de
dc.subject.gokfull	Biologie (PPN619462639)	de
dc.description.embargoed	2023-03-10	de
dc.identifier.ppn	1838159797
dc.notes.confirmationsent	Confirmation sent 2023-03-06T06:15:01	de

Dateien

Name:PhDThesis_MBlanke.pdf

Größe:18.76Mb

Format:PDF

Öffnen

Name:: PhDThesis_MBlanke.pdf
Größe:: 18.76Mb
Format:: PDF

Öffnen

Das Dokument erscheint in:

Fakultät für Biologie und Psychologie (inkl. GAUSS) [1621]

Zur Kurzanzeige