Zur Kurzanzeige

Fast methods for metagenomic sequence search and annotation

dc.contributor.advisorSöding, Johannes Dr.
dc.contributor.authorMirdita, Milot
dc.date.accessioned2022-06-16T13:10:59Z
dc.date.available2022-06-23T00:50:11Z
dc.date.issued2022-06-16
dc.identifier.urihttp://resolver.sub.uni-goettingen.de/purl?ediss-11858/14100
dc.identifier.urihttp://dx.doi.org/10.53846/goediss-9294
dc.language.isoengde
dc.subject.ddc510de
dc.titleFast methods for metagenomic sequence search and annotationde
dc.typecumulativeThesisde
dc.contributor.refereeSöding, Johannes Dr.
dc.date.examination2022-02-21de
dc.description.abstractengThe past two decades have seen the development of metagenomics, the study of genes and genomes of multiple organisms simultaneously. In contrast to traditional genomic techniques, which require isolating and growing individual organisms in the lab, in metagenomics, samples are directly taken from the environment, sequenced and then analyzed in silico. Modern sequencing techniques have enabled high throughput read-out of DNA and RNA of microorganism communities in marine, soil, gut and many other environments. The plethora of data generated using these techniques poses a major challenge for existing computational techniques. This burden translates directly to computational run times and the cost of resources required to carry out metagenomic analyses. Thus, computational methods developed for metagenomic analysis require exceptional efficiency and speed. At the same time, metagenomic studies become relevant for more and more fields of research, requiring that techniques be suited for a wide range of scientific disciplines. In this work, I present three methods I developed to address the throughput bottlenecks of data analysis in metagenomics. (1) The MMseqs2 webserver is a user-friendly extension of the popular homology search method MMseqs2 designed for non-expert bioinformaticians. I accelerated MMseqs2 to process single queries much more quickly and introduced an API to enable MMseqs2's use in web applications. (2) MMseqs2 taxonomy is a method for fast and accurate taxonomy assignment of metagenomic contigs. (3) ColabFold is a method to make the groundbreaking AlphaFold2 protein structure predictions widely accessible, accelerating its input sequence alignment generation and improving its accuracy by assembling a novel database enriched with metagenomic sequences from a multitude of datasets. These methods improve upon the state-of-the-art by introducing novel algorithms and accelerating previous ones - such that previously infeasible analyses become possible - and making our metagenomic toolbox accessible to users of a wide range of skill levels.de
dc.contributor.coRefereeWaack, Stephan Prof. Dr.
dc.contributor.thirdRefereeFernandez-Guerra, Antonio Prof. Dr.
dc.subject.engProteinsde
dc.subject.engSequence Analysisde
dc.subject.engMetagenomicsde
dc.subject.engProtein Structure Predictionde
dc.subject.engHomologyde
dc.subject.engWebserverde
dc.identifier.urnurn:nbn:de:gbv:7-ediss-14100-3
dc.affiliation.instituteFakultät für Mathematik und Informatikde
dc.subject.gokfullInformatik (PPN619939052)de
dc.description.embargoed2022-06-23de
dc.identifier.ppn1807224554
dc.identifier.orcid0000-0001-8637-6719de


Dateien

Thumbnail

Das Dokument erscheint in:

Zur Kurzanzeige