Show simple item record

Computational methods for prokaryotic (meta)genomic annotation and function prediction using horizontal information transfer

dc.contributor.advisorSöding, Johannes Dr.
dc.contributor.authorZhang, Ruoshi
dc.date.accessioned2024-08-05T16:37:16Z
dc.date.available2024-08-12T00:50:07Z
dc.date.issued2024-08-05
dc.identifier.urihttp://resolver.sub.uni-goettingen.de/purl?ediss-11858/15406
dc.identifier.urihttp://dx.doi.org/10.53846/goediss-10594
dc.format.extent127de
dc.language.isoengde
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subject.ddc570de
dc.titleComputational methods for prokaryotic (meta)genomic annotation and function prediction using horizontal information transferde
dc.typedoctoralThesisde
dc.contributor.refereeMorgenstern, Burkhard Prof. Dr.
dc.date.examination2023-08-24de
dc.description.abstractengIn recent decades, there has been a surge in genomic sequences resulting from large-scale genomic and metagenomic sequencing projects, driven by the significant drop in sequencing costs. More powerful methods to annotate the deluge of metagenomic and genomic sequences are urgently needed. Metagenomic samples are mostly comprised of viruses and prokaryotes (bacteria and archaea), which are living under a constant flux and with complex interplay. This poses new challenges to understand fully the information brought to light by these new genomic sequences, particularly with regard to three key questions: Who are they? What do they do? With whom do they interact? In this work, I will present three projects aimed at shedding light on the latter two questions. To overcome the bottlenecks of detection sensitivity and speed, I improved upon the state-of-the-art methods by proposing novel and efficient algorithms designed to run on large-scale genomic and metagenomic data. (1) SpacePHARER is a fast and sensitive method to identify phage-host relationships de novo from CRISPR spacer sequences. Compared to conventional BLASTN approaches, SpacePHARER is 1.4 to 4 times more sensitive and up to 47 times faster, making it suitable for analyzing massive metagenomic datasets. (2) Spacedust is a sensitive method for systematically identifying conserved gene clusters de novo from prokaryotic genomes, which facilitates protein functional annotation, as functionally associated genes tend to cluster close to each other in the prokaryotic genomes. By performing sensitive structural similarity searches with Foldseek, Spacedust enables comprehensive and efficient all-vs-all searches of thousands of genomes. (3) An self-supervised method for accurate and systematic operon prediction that can be universally applied on prokaryotic genomes and metagenomic bins, which provides insights into the function and regulations of genes. This method leverages conserved clusters detected by Spacedust and incorporates intergenic distance information through a self-training approach, providing valuable insights into gene function and regulation. The performance of this method surpasses existing operon prediction techniques on a diverse set of seven genomes. These approaches are particularly valuable because they do not rely on prior knowledge or bias towards well-characterized model organisms. Instead, they leverage the diversity of organ-isms uncovered through metagenomics to fully explore and exploit the unannotated genomic sequences. This work underscores the immense potential that often remains unexplored within large-scale prokaryotic genomic data, and represents a significant advancement in our capability to decode them.de
dc.contributor.coRefereeLiepe, Juliane Dr.
dc.subject.engMetagenomicsde
dc.subject.engHomologyde
dc.subject.engSequence Analysisde
dc.subject.engPhage-host relationshipsde
dc.subject.engProtein function annotationde
dc.identifier.urnurn:nbn:de:gbv:7-ediss-15406-7
dc.affiliation.instituteGöttinger Graduiertenschule für Neurowissenschaften, Biophysik und molekulare Biowissenschaften (GGNB)de
dc.subject.gokfullBiologie (PPN619462639)de
dc.description.embargoed2024-08-12de
dc.identifier.ppn1897979444
dc.identifier.orcid0000-0002-1982-4793de
dc.notes.confirmationsentConfirmation sent 2024-08-05T19:45:01de


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record