Robust phylogenomic inference through genome skimming as demonstrated by a systematic analysis of Nereididae
Doctoral thesis
Date of Examination:2022-09-05
Date of issue:2022-12-16
Advisor:Prof. Dr. Christoph Bleidorn
Referee:Prof. Dr. Burkhard Morgenstern
Referee:Dr. Nico Posnien
Files in this item
Name:thalen_thesis_2022-10-16.pdf
Size:11.9Mb
Format:PDF
Description:Felix Thalén's thesis
Abstract
English
The introduction of massively parallel sequencing in the mid-2000s has truly revolutionized the field of molecular phylogenetics and ultimately our understanding of the tree of life. Despite today's existence of long-read sequencing and hybrid strategies—with the ability to produce high-quality, near-chromosome-level assemblies—monetary costs, sampling restrictions, and other practical considerations still favors next-generation sequencing (NGS) in many instances. To this date, most large-scale phylogenetic or "phylogenomic" studies were conducted using a "genome reduction" strategy such as transcriptome sequencing or target enrichment. Until recently, whole-genome sequencing (WGS) was often dismissed or overlooked due higher sequencing costs and a lack of appropriate bioinformatic tools to process the data downstream. Despite this, WGS has many advantages over other techniques such as reduced laboratory workload, smaller DNA volume and quality requirements, and higher data re-usability, even outside phylogenetics. Now, advancements in short-read sequencing technology has reduced the costs of sequencing and the development of new, alignment-based bioinformatic software for working with WGS data should have phylogeneticists reconsider this sequencing approach. Still, we saw the need for a fast, scalable, and easy-to-use method for mining desired loci from raw reads or assembled contigs. Thus, we here present Patchwork, a new, alignment-based program, which mines phylogenetic markers from WGS data by "stitching" overlapping and or adjacent sequence regions. A novel sliding-window based algorithm trims non-coding regions from extracted markers. We ultimately demonstrate the utility of both Patchwork—and for using WGS in a phylogenomic context—by using this tool to reconstruct the phylogeny of the annelid family Nereididae. All previous attempts to infer the phylogeny of Nereididae have been limited to morphological data or by using one or a handful of mitochondrial genes. Most of these studies were also severely limited in taxonomic coverage. Here, we present trees inferred from a set of 777 near-universal single-copy orthologs and mitochondrial genomes, containing a total of 100 and 132 taxa respectively, to produce a well-supported and congruent phylogeny of the group.
Keywords: phylogenomics; whole-genome sequencing; phylogenetics; alignment; short-read sequencing; annelida; nereididae; bioinformatics; phylogeny; worms; genomics