• Deutsch
    • English
  • English 
    • Deutsch
    • English
  • Login
Item View 
  •   Home
  • Naturwissenschaften, Mathematik und Informatik
  • Fakultät für Biologie und Psychologie (inkl. GAUSS)
  • Item View
  •   Home
  • Naturwissenschaften, Mathematik und Informatik
  • Fakultät für Biologie und Psychologie (inkl. GAUSS)
  • Item View
JavaScript is disabled for your browser. Some features of this site may not work without it.

Alignment-free Phylogenetic Placement and its Applications

by Matthias Blanke
Doctoral thesis
Date of Examination:2023-02-17
Date of issue:2023-03-03
Advisor:Prof. Dr. Burkhard Morgenstern
Referee:Prof. Dr. Burkhard Morgenstern
Referee:Dr. Johannes Söding
crossref-logoPersistent Address: http://dx.doi.org/10.53846/goediss-9762

 

 

Files in this item

Name:PhDThesis_MBlanke.pdf
Size:18.7Mb
Format:PDF
ViewOpen

The following license files are associated with this item:


Abstract

English

The study of the evolutionary interrelations of living organisms has been at the heart of biological sciences all along. A revolution in sequencing techniques in the past decades has caused a massive increase in molecular sequence data. As a result, contemporary methods assess evolutionary relationships between organisms by quantifying the degree of similarity between their biological sequence data. The discovered relationships of phylogenetic studies are commonly represented and visualized by phylogenetic trees or networks. Traditionally, sequences have been extracted from single organisms; however, recent technological progress has enabled the retrieval of sequence data directly from environmental samples. In doing so, large numbers of short sequencing reads arise that may originate from all organisms present in the respective environment. One major subsequent objective is the taxonomic or phylogenetic identification of those sequencing reads. However, longstanding maximum-likelihood-based de-novo phylogeny reconstruction methods are limited in their applicability by their computational demands; typically, they cannot be applied when the available molecular sequences are present in great numbers or are of great length. Fortunately, phylogenetic placement offers a unique approach to identify large sets of query reads within their phylogenetic context by inserting them into an existing phylogenetic tree comprising a set of reference sequences. Here, we present a new alignment- and assembly-free approach to phylogenetic placement, the Alignment-free phylogenetic placement algorithm based on Spaced-word Matches (App-SpaM). App-SpaM extracts short, non-contiguous subwords to detect homologies between the query and reference sequences, a method known as the spaced-word matches approach. It counts the number of such words and utilizes them to infer the average number of nucleotide substitutions between each read and each reference sequence. Then, it uses fast heuristics to infer a suitable placement position within the reference tree. We assessed how App-SpaM compares to existing algorithms for phylogenetic placement with respect to accuracy and computation speed in a comprehensive evaluation. We demonstrate that App-SpaM is on par with maximum- likelihood-based algorithms on metataxonomic data sets. In addition, App-SpaM is two to three orders of magnitude faster than the next fastest programs while its memory demands stay low. We extensively discuss App-SpaM’s advantages and drawbacks and propose several additional features to improve upon its original version: For this, we evaluate a set of novel placement heuristics, the use of sampling techniques to allow an improved scalability with the length of the reference sequences, and a measure for the uncertainty of proposed placement positions. Subsequently, we present a variety of novel use cases of phylogenetic that are made uniquely possible by App-SpaM’s versatility with respect to its potential input data. These applications include, in particular, the iterative augmentation of existing species trees by means of phylogenetic placement and the screening for outlier genes or species prior to phylogeny reconstruction.
Keywords: Phylogenetic Placement; Metagenomics; Metataxonomics; Phylogenetics; Alignment-free; Spaced Words
 

Statistik

Publish here

Browse

All of eDissFaculties & ProgramsIssue DateAuthorAdvisor & RefereeAdvisorRefereeTitlesTypeThis FacultyIssue DateAuthorAdvisor & RefereeAdvisorRefereeTitlesType

Help & Info

Publishing on eDissPDF GuideTerms of ContractFAQ

Contact Us | Impressum | Cookie Consents | Data Protection Information
eDiss Office - SUB Göttingen (Central Library)
Platz der Göttinger Sieben 1
Mo - Fr 10:00 – 12:00 h


Tel.: +49 (0)551 39-27809 (general inquiries)
Tel.: +49 (0)551 39-28655 (open access/parallel publications)
ediss_AT_sub.uni-goettingen.de
[Please replace "_AT_" with the "@" sign when using our email adresses.]
Göttingen State and University Library | Göttingen University
Medicine Library (Doctoral candidates of medicine only)
Robert-Koch-Str. 40
Mon – Fri 8:00 – 24:00 h
Sat - Sun 8:00 – 22:00 h
Holidays 10:00 – 20:00 h
Tel.: +49 551 39-8395 (general inquiries)
Tel.: +49 (0)551 39-28655 (open access/parallel publications)
bbmed_AT_sub.uni-goettingen.de
[Please replace "_AT_" with the "@" sign when using our email adresses.]