Zur Kurzanzeige

Statistical methods for biological sequence analysis for DNA binding motifs and protein contacts

dc.contributor.advisorSöding, Johannes Dr.
dc.contributor.authorRoth, Christian
dc.date.accessioned2021-09-21T14:10:13Z
dc.date.available2021-09-27T00:50:08Z
dc.date.issued2021-09-21
dc.identifier.urihttp://hdl.handle.net/21.11130/00-1735-0000-0008-5912-0
dc.identifier.urihttp://dx.doi.org/10.53846/goediss-8835
dc.identifier.urihttp://dx.doi.org/10.53846/goediss-8835
dc.language.isoengde
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject.ddc570de
dc.titleStatistical methods for biological sequence analysis for DNA binding motifs and protein contactsde
dc.typedoctoralThesisde
dc.contributor.refereeSöding, Johannes Dr.
dc.date.examination2021-09-06
dc.description.abstractengOver the last decades a revolution in novel measurement techniques has permeated the biological sciences filling the databases with unprecedented amounts of data ranging from genomics, transcriptomics, proteomics and metabolomics to structural and ecological data. In order to extract insights from the vast quantity of data, computational and statistical methods are nowadays crucial tools in the toolbox of every biological researcher. In this thesis I summarize my contributions in two data-rich fields in biological sciences: transcription factor binding to DNA and protein structure prediction from protein sequences with shared evolutionary ancestry. In the first part of my thesis I introduce our work towards a web server for analysing transcription factor binding data with Bayesian Markov Models. In contrast to classical PWM or di-nucleotide models, Bayesian Markov models can capture complex inter-nucleotide dependencies that can arise from shape-readout and alternative binding modes. In addition to giving access to our methods in an easy-to-use, intuitive web-interface, we provide our users with novel tools and visualizations to better evaluate the biological relevance of the inferred binding motifs. We hope that our tools will prove useful for investigating weak and complex transcription factor binding motifs which cannot be predicted accurately with existing tools. The second part discusses a statistical attempt to correct out the phylogenetic bias arising in co-evolution methods applied to the contact prediction problem. Co-evolution methods have revolutionized the protein-structure prediction field more than 10 years ago, and, until very recently, have retained their importance as crucial input features to deep neural networks. As the co-evolution information is extracted from evolutionarily related sequences, we investigated whether the phylogenetic bias to the signal can be corrected out in a principled way using a variation of the Felsenstein's tree-pruning algorithm applied in combination with an independent-pair assumption to derive pairwise amino counts that are corrected for the evolutionary history. Unfortunately, the contact prediction derived from our corrected pairwise amino acid counts did not yield a competitive performance.de
dc.contributor.coRefereeBeißbarth, Tim Prof. Dr.
dc.subject.engDe-novo motif discoveryde
dc.subject.engProtein structure predictionde
dc.subject.engBayesian statisticsde
dc.identifier.urnurn:nbn:de:gbv:7-21.11130/00-1735-0000-0008-5912-0-2
dc.affiliation.instituteGöttinger Graduiertenschule für Neurowissenschaften, Biophysik und molekulare Biowissenschaften (GGNB)de
dc.subject.gokfullBiologie (PPN619462639)de
dc.description.embargoed2021-09-27
dc.identifier.ppn1771585064


Dateien

Thumbnail

Das Dokument erscheint in:

Zur Kurzanzeige