Statistical methods for biological sequence analysis for DNA binding motifs and protein contacts

Roth, Christian

dc.contributor.advisor	Söding, Johannes Dr.
dc.contributor.author	Roth, Christian
dc.date.accessioned	2021-09-21T14:10:13Z
dc.date.available	2021-09-27T00:50:08Z
dc.date.issued	2021-09-21
dc.identifier.uri	http://hdl.handle.net/21.11130/00-1735-0000-0008-5912-0
dc.identifier.uri	http://dx.doi.org/10.53846/goediss-8835
dc.identifier.uri	http://dx.doi.org/10.53846/goediss-8835
dc.language.iso	eng	de
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject.ddc	570	de
dc.title	Statistical methods for biological sequence analysis for DNA binding motifs and protein contacts	de
dc.type	doctoralThesis	de
dc.contributor.referee	Söding, Johannes Dr.
dc.date.examination	2021-09-06
dc.description.abstracteng	Over the last decades a revolution in novel measurement techniques has permeated the biological sciences filling the databases with unprecedented amounts of data ranging from genomics, transcriptomics, proteomics and metabolomics to structural and ecological data. In order to extract insights from the vast quantity of data, computational and statistical methods are nowadays crucial tools in the toolbox of every biological researcher. In this thesis I summarize my contributions in two data-rich fields in biological sciences: transcription factor binding to DNA and protein structure prediction from protein sequences with shared evolutionary ancestry. In the first part of my thesis I introduce our work towards a web server for analysing transcription factor binding data with Bayesian Markov Models. In contrast to classical PWM or di-nucleotide models, Bayesian Markov models can capture complex inter-nucleotide dependencies that can arise from shape-readout and alternative binding modes. In addition to giving access to our methods in an easy-to-use, intuitive web-interface, we provide our users with novel tools and visualizations to better evaluate the biological relevance of the inferred binding motifs. We hope that our tools will prove useful for investigating weak and complex transcription factor binding motifs which cannot be predicted accurately with existing tools. The second part discusses a statistical attempt to correct out the phylogenetic bias arising in co-evolution methods applied to the contact prediction problem. Co-evolution methods have revolutionized the protein-structure prediction field more than 10 years ago, and, until very recently, have retained their importance as crucial input features to deep neural networks. As the co-evolution information is extracted from evolutionarily related sequences, we investigated whether the phylogenetic bias to the signal can be corrected out in a principled way using a variation of the Felsenstein's tree-pruning algorithm applied in combination with an independent-pair assumption to derive pairwise amino counts that are corrected for the evolutionary history. Unfortunately, the contact prediction derived from our corrected pairwise amino acid counts did not yield a competitive performance.	de
dc.contributor.coReferee	Beißbarth, Tim Prof. Dr.
dc.subject.eng	De-novo motif discovery	de
dc.subject.eng	Protein structure prediction	de
dc.subject.eng	Bayesian statistics	de
dc.identifier.urn	urn:nbn:de:gbv:7-21.11130/00-1735-0000-0008-5912-0-2
dc.affiliation.institute	Göttinger Graduiertenschule für Neurowissenschaften, Biophysik und molekulare Biowissenschaften (GGNB)	de
dc.subject.gokfull	Biologie (PPN619462639)	de
dc.description.embargoed	2021-09-27
dc.identifier.ppn	1771585064

Dateien

Name:PhDThesis_roth.pdf

Größe:9.902Mb

Format:PDF

Öffnen

Name:: PhDThesis_roth.pdf
Größe:: 9.902Mb
Format:: PDF

Öffnen

Das Dokument erscheint in:

GGNB - Göttinger Graduiertenzentrum für Neurowissenschaften, Biophysik und molekulare Biowissenschaften [1114]
GGNB - Göttingen Graduate Center for Neurosciences, Biophysics and Molecular Biosciences

Zur Kurzanzeige