Filtered spaced-word matches: a novel approach to fast and accurate sequence comparison

Leimeister, Chris-Andre

dc.contributor.advisor	Morgenstern, Burkhard Prof. Dr.
dc.contributor.author	Leimeister, Chris-Andre
dc.date.accessioned	2019-01-22T12:56:26Z
dc.date.available	2019-01-22T12:56:26Z
dc.date.issued	2019-01-22
dc.identifier.uri	http://hdl.handle.net/11858/00-1735-0000-002E-E563-C
dc.identifier.uri	http://dx.doi.org/10.53846/goediss-7229
dc.identifier.uri	http://dx.doi.org/10.53846/goediss-7229
dc.language.iso	eng	de
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject.ddc	572	de
dc.title	Filtered spaced-word matches: a novel approach to fast and accurate sequence comparison	de
dc.type	doctoralThesis	de
dc.title.translated	Filtered spaced-word matches: a novel approach to fast and accurate sequence comparison	de
dc.contributor.referee	Söding, Johannes Dr.
dc.date.examination	2018-12-12
dc.description.abstracteng	Standard methods for biological sequence comparison and phylogeny reconstruction are traditionally based on sequence alignments. These methods are very accurate but also computationally expensive. Because of the exponentially growing amount of biological sequence data, alignment-free methods have become more important over the past decades. Alignment-free methods are substantially faster than alignment-based methods and are essential for large scale sequence comparison. One major application of alignment-free methods is whole genome phylogeny reconstruction. To this end, distances between pairs of genomes are calculated and subsequently clustered. Current alignment-free methods are fast but less accurate than alignment-based approaches. In this thesis, I developed the filtered spaced-word matches (FSWM) approach, a new alignment-free method for fast and accurate whole genome phylogeny reconstruction. FSWM rapidly identifies spaced-word matches which are defined by patterns of match and don’t care positions. The fraction of non-matching nucleotides at the don’t care positions are used to estimate evolutionary distances. To reduce the noise from random matches, I developed a filtering technique which calculates a similarity score for each spaced-word match and discards matches with a score below a threshold. This filtering removes most of the unwanted background matches and the distances calculated based on the remaining spaced-word matches are very accurate. Moreover, I investigated if FSWM can be used to identify anchor points for genome alignments. I integrated a slightly modified version of FSWM into mugsy, a popular multiple-genome-alignment pipeline. If FSWM is used to identify anchor points, more homologies are found and aligned and the alignments are of higher quality. Furthermore, I transferred the idea of FSWM from genomic sequences to protein sequences. I developed Prot-SpaM, a fast tool which estimates evolutionary distances between pairs of whole proteoms. Prot-SpaM is the first alignment-free tool that estimates the number of substitutions between pairs of protein sequences without sequence alignment.	de
dc.contributor.coReferee	Beißbarth, Tim Prof. Dr.
dc.subject.eng	alignment-free	de
dc.subject.eng	sequence comparison	de
dc.identifier.urn	urn:nbn:de:gbv:7-11858/00-1735-0000-002E-E563-C-3
dc.affiliation.institute	Göttinger Zentrum für molekulare Biowissenschaften (GZMB)	de
dc.subject.gokfull	Molekularbiologie, Gentechnologie (PPN619462973)	de
dc.identifier.ppn	1047184788

Dateien

Name:thesis_full.pdf

Größe:3.288Mb

Format:PDF

Beschreibung:Dissertation

Öffnen

Name:: thesis_full.pdf
Größe:: 3.288Mb
Format:: PDF
Beschreibung:: Dissertation

Öffnen

Das Dokument erscheint in:

GZMB - Göttinger Zentrum für molekulare Biowissenschaften [38]
GCMB - Göttingen Center for Molecular Biosciences

Zur Kurzanzeige