Zur Kurzanzeige

Statistical models for large-scale comparative metagenome analysis

dc.contributor.advisorMorgenstern, Burkhard Prof. Dr.
dc.contributor.authorAßhauer, Kathrin Petra
dc.date.accessioned2015-04-22T08:39:39Z
dc.date.available2015-04-22T08:39:39Z
dc.date.issued2015-04-22
dc.identifier.urihttp://hdl.handle.net/11858/00-1735-0000-0022-5FBD-0
dc.identifier.urihttp://dx.doi.org/10.53846/goediss-5033
dc.language.isoengde
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/
dc.subject.ddc570de
dc.titleStatistical models for large-scale comparative metagenome analysisde
dc.typecumulativeThesisde
dc.contributor.refereeMorgenstern, Burkhard Prof. Dr.
dc.date.examination2015-02-19
dc.description.abstractengMetagenomics, as a culture-independent approach, enables the exploration of complex heterogeneous microbial communities under natural conditions by massive sequencing of community-specific DNA. Metagenomic data sets, derived from various environments, provide new insights into microbial life. Large-scale projects like the Human Microbiome Project or the Earth Microbiome Project emphasize the increasing importance of metagenomics for biomedical and ecosystem research. However, such projects are currently challenging bioinformatics due to the explosive increase in sequencing data. New computationally efficient and statistically adequate methods are required to answer the essential questions “Who is in there?” and “What are they doing?”. In this thesis, I developed the Mixture-of-Pathways (MoP) model and Tax4Fun approach. Both methods link the taxonomic profile to a set of pre-computed reference profiles to predict the metabolic repertoire of the microbial community. Since the taxonomic profile is normally estimated to answer the question “Who is in there?”, the further use of the taxonomic profile avoids additional costs for answering the question “What are they doing?”. Tax4Fun is specifically designed for the output of 16S rRNA analysis pipelines using the SILVA database as reference, whereas the MoP model is especially conceived for metagenome sequence data and provides a robust statistical basis to describe the metabolic potential of a microbial community. The adequate metabolic modeling of metagenomes provides a concise summary of the functional variation of metagenomes across many samples, enabling the identification of relevant metabolic differences in comparative analyses. For comparative metagenomics, the identification of similar metagenomes to a newly obtained dataset is of growing importance. For an efficient large-scale identification of closely related metagenomes within a database retrieval context, I conducted a detailed evaluation of a k-nearest-neighbor search utilizing different biological feature profiles and metrics. I demonstrated that different features and metrics can be chosen for a convenient interpretation of results in terms of the underlying features. The integration of the k-nearest-neighbor search into metagenome annotation and comparison systems is beneficial to automatically identify additional metagenomes for comparative analyses as well as to detect mislabeled or contaminated datasets by unexpected neighboring habitat labels. The MoP approach and k-nearest-neighbor search are available to the scientific community as part of the CoMet-Universe web server application. Additionally, the MoP and Tax4Fun approach are provided as R Package.de
dc.contributor.coRefereeWingender, Edgar Prof. Dr.
dc.subject.engBioinformaticsde
dc.subject.engMetagenomicsde
dc.subject.engNGSde
dc.subject.eng16S rRNAde
dc.subject.engMetabolic pathwaysde
dc.subject.engComparative analysisde
dc.identifier.urnurn:nbn:de:gbv:7-11858/00-1735-0000-0022-5FBD-0-3
dc.affiliation.instituteGöttinger Graduiertenschule für Neurowissenschaften, Biophysik und molekulare Biowissenschaften (GGNB)de
dc.subject.gokfullBiologie (PPN619462639)de
dc.identifier.ppn823159949


Dateien

Thumbnail

Das Dokument erscheint in:

Zur Kurzanzeige