Zur Kurzanzeige

New approaches and software for sequence function prediction and gene design integrating specific knowledge on protein structure and translation

dc.contributor.advisorWaack, Stephan Prof. Dr.
dc.contributor.authorSimm, Dominic Alexander
dc.date.accessioned2022-03-18T09:44:19Z
dc.date.available2022-03-25T00:50:08Z
dc.date.issued2022-03-18
dc.identifier.urihttp://resolver.sub.uni-goettingen.de/purl?ediss-11858/13937
dc.identifier.urihttp://dx.doi.org/10.53846/goediss-9118
dc.language.isoengde
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subject.ddc510de
dc.titleNew approaches and software for sequence function prediction and gene design integrating specific knowledge on protein structure and translationde
dc.typedoctoralThesisde
dc.contributor.refereeWaack, Stephan Prof. Dr.
dc.date.examination2022-02-16de
dc.description.abstractengIn recent years, advances in sequencing techniques resulted in an explosive increase in sequencing data. By now, there are billions of generated protein sequences available in public databases to a large extent without experimentally proven structure and function. Computational approaches and bioinformatic analysis, like the ones presented here, complement and assist experimentally intensive research efforts by processing, analyzing and interpreting the produced amounts of biological data and by giving first forecasts to uncover hidden knowledge from the data and help answering fundamental biological questions. Coiled-coil prediction software has an important role in one of the first steps of the structural annotation of newly generated protein sequences with unknown molecule structure. Unfortunately established software have been shown to have a rather limited applicability especially in terms of the prediction quality with regard to large-scale coiled-coil analyses. The web-application »Waggawagga« was developed for the comparative visualization of coiled-coil predictions generated by different software packages (Simm et al., 2015). As a basis for decision-making over a coiled-coil domain in question, the strength of the majority consensus of multiple and freely combinable prediction tools builds the central aspect of this comparative approach to overcome the limitations of single applications. Supportive hints are provided by the specially developed SAH prediction algorithm, that enables a discrimination between putative coiled coils and actual single α-helix domains (SAH) and helps in the identification of real coiled-coil domains. The developed SAH prediction algorithm is both part of the web-application and available as a stand-alone version for the command line, termed »Waggawagga-CLI«. Its function has been tested and evaluated in detail in two studies that investigated the distribution and evolution of predicted SAH domains in the myosin motor protein family (Simm et al., 2017) and in two dozen eukaryotic organisms across the tree of life (Simm and Kollmar, 2018). The results revealed that SAH-domains occur in 0.5 to 3.5% of the protein-coding content per investigated species and are particularly present in longer proteins supporting their function as structural building block in multi-domain proteins. In addition, a large-scale in-depth prediction analysis was performed by testing the most relevant softwares of the field against the most comprehensive reference data set available, the entire Protein Data Bank, and tracked down the results to each amino acid and its secondary structure (Simm et al., 2021). Comparing the binary classifications metrics with naïve coin-flip models suggests that the tested tools’ performance is close to random. This implicates that the tools’ predictions have only limited informative value, should be treated very cautiously and need to be supported and validated by experimental evidence. Heterologous protein expression is often applied in the investigation of cellular functions, in genetic circuit engineering, in overexpressing proteins for biopharmaceutical applications and structural biology research. One of the key factors for heterologous expression represents the degeneracy of the genetic code, which enables a single protein to be encoded by a multitude of synonymous gene sequences and simultaneously allows adjusting gene sequences without changing the protein sequences, substantial uncertainty exists concerning the details of this phenomenon. With the development of the software »Odysseus«, we realized a new probabilistic approach that exploits this particular genetic property to design typical genes for heterologous expression applications in common model organisms (Simm et. al, submitted). The Markov model based approach is highly configurable and can be operated with pre-trained genome profiles to control protein expression levels by the codon usage adaptation of genes. We evaluated the influence of the profiled codon usage adaptation approach on protein expression levels in the eukaryotic model organism Saccharomyces cerevisiae. Therefore, we selected green fluorescent protein (GFP) and human α-synuclein (αSyn) as representatives for stable and intrinsically disordered proteins as representing a benchmark and a challenging test case. GFP was expressed at high levels, and the toxic αSyn could be adapted to endogenous, low-level expression. The new software is publicly available as a web-application for performing host-specific protein adaptations to a set of the most commonly used model organisms.de
dc.contributor.coRefereeKollmar, Martin PD Dr.
dc.subject.engBioinformatic algorithmsde
dc.subject.engStructural biologyde
dc.subject.engSequence based predictionsde
dc.subject.engBiological databases and web-servicesde
dc.subject.engCoiled-coil prediction and evaluationde
dc.subject.engHeterologous gene expressionde
dc.identifier.urnurn:nbn:de:gbv:7-ediss-13937-1
dc.affiliation.instituteFakultät für Mathematik und Informatikde
dc.subject.gokfullInformatik (PPN619939052)de
dc.description.embargoed2022-03-25de
dc.identifier.ppn1796120952


Dateien

Thumbnail

Das Dokument erscheint in:

Zur Kurzanzeige