• Deutsch
    • English
  • English 
    • Deutsch
    • English
  • Login
Item View 
  •   Home
  • Naturwissenschaften, Mathematik und Informatik
  • Fakultät für Mathematik und Informatik (inkl. GAUSS)
  • Item View
  •   Home
  • Naturwissenschaften, Mathematik und Informatik
  • Fakultät für Mathematik und Informatik (inkl. GAUSS)
  • Item View
JavaScript is disabled for your browser. Some features of this site may not work without it.

New approaches and software for sequence function prediction and gene design integrating specific knowledge on protein structure and translation

by Dominic Alexander Simm
Doctoral thesis
Date of Examination:2022-02-16
Date of issue:2022-03-18
Advisor:Prof. Dr. Stephan Waack
Referee:Prof. Dr. Stephan Waack
Referee:PD Dr. Martin Kollmar
crossref-logoPersistent Address: http://dx.doi.org/10.53846/goediss-9118

 

 

Files in this item

Name:thesis_20220107_wo_cv.pdf
Size:25.1Mb
Format:PDF
ViewOpen

The following license files are associated with this item:


Abstract

English

In recent years, advances in sequencing techniques resulted in an explosive increase in sequencing data. By now, there are billions of generated protein sequences available in public databases to a large extent without experimentally proven structure and function. Computational approaches and bioinformatic analysis, like the ones presented here, complement and assist experimentally intensive research efforts by processing, analyzing and interpreting the produced amounts of biological data and by giving first forecasts to uncover hidden knowledge from the data and help answering fundamental biological questions. Coiled-coil prediction software has an important role in one of the first steps of the structural annotation of newly generated protein sequences with unknown molecule structure. Unfortunately established software have been shown to have a rather limited applicability especially in terms of the prediction quality with regard to large-scale coiled-coil analyses. The web-application »Waggawagga« was developed for the comparative visualization of coiled-coil predictions generated by different software packages (Simm et al., 2015). As a basis for decision-making over a coiled-coil domain in question, the strength of the majority consensus of multiple and freely combinable prediction tools builds the central aspect of this comparative approach to overcome the limitations of single applications. Supportive hints are provided by the specially developed SAH prediction algorithm, that enables a discrimination between putative coiled coils and actual single α-helix domains (SAH) and helps in the identification of real coiled-coil domains. The developed SAH prediction algorithm is both part of the web-application and available as a stand-alone version for the command line, termed »Waggawagga-CLI«. Its function has been tested and evaluated in detail in two studies that investigated the distribution and evolution of predicted SAH domains in the myosin motor protein family (Simm et al., 2017) and in two dozen eukaryotic organisms across the tree of life (Simm and Kollmar, 2018). The results revealed that SAH-domains occur in 0.5 to 3.5% of the protein-coding content per investigated species and are particularly present in longer proteins supporting their function as structural building block in multi-domain proteins. In addition, a large-scale in-depth prediction analysis was performed by testing the most relevant softwares of the field against the most comprehensive reference data set available, the entire Protein Data Bank, and tracked down the results to each amino acid and its secondary structure (Simm et al., 2021). Comparing the binary classifications metrics with naïve coin-flip models suggests that the tested tools’ performance is close to random. This implicates that the tools’ predictions have only limited informative value, should be treated very cautiously and need to be supported and validated by experimental evidence. Heterologous protein expression is often applied in the investigation of cellular functions, in genetic circuit engineering, in overexpressing proteins for biopharmaceutical applications and structural biology research. One of the key factors for heterologous expression represents the degeneracy of the genetic code, which enables a single protein to be encoded by a multitude of synonymous gene sequences and simultaneously allows adjusting gene sequences without changing the protein sequences, substantial uncertainty exists concerning the details of this phenomenon. With the development of the software »Odysseus«, we realized a new probabilistic approach that exploits this particular genetic property to design typical genes for heterologous expression applications in common model organisms (Simm et. al, submitted). The Markov model based approach is highly configurable and can be operated with pre-trained genome profiles to control protein expression levels by the codon usage adaptation of genes. We evaluated the influence of the profiled codon usage adaptation approach on protein expression levels in the eukaryotic model organism Saccharomyces cerevisiae. Therefore, we selected green fluorescent protein (GFP) and human α-synuclein (αSyn) as representatives for stable and intrinsically disordered proteins as representing a benchmark and a challenging test case. GFP was expressed at high levels, and the toxic αSyn could be adapted to endogenous, low-level expression. The new software is publicly available as a web-application for performing host-specific protein adaptations to a set of the most commonly used model organisms.
Keywords: Bioinformatic algorithms; Structural biology; Sequence based predictions; Biological databases and web-services; Coiled-coil prediction and evaluation; Heterologous gene expression
 

Statistik

Publish here

Browse

All of eDissFaculties & ProgramsIssue DateAuthorAdvisor & RefereeAdvisorRefereeTitlesTypeThis FacultyIssue DateAuthorAdvisor & RefereeAdvisorRefereeTitlesType

Help & Info

Publishing on eDissPDF GuideTerms of ContractFAQ

Contact Us | Impressum | Cookie Consents | Data Protection Information
eDiss Office - SUB Göttingen (Central Library)
Platz der Göttinger Sieben 1
Mo - Fr 10:00 – 12:00 h


Tel.: +49 (0)551 39-27809 (general inquiries)
Tel.: +49 (0)551 39-28655 (open access/parallel publications)
ediss_AT_sub.uni-goettingen.de
[Please replace "_AT_" with the "@" sign when using our email adresses.]
Göttingen State and University Library | Göttingen University
Medicine Library (Doctoral candidates of medicine only)
Robert-Koch-Str. 40
Mon – Fri 8:00 – 24:00 h
Sat - Sun 8:00 – 22:00 h
Holidays 10:00 – 20:00 h
Tel.: +49 551 39-8395 (general inquiries)
Tel.: +49 (0)551 39-28655 (open access/parallel publications)
bbmed_AT_sub.uni-goettingen.de
[Please replace "_AT_" with the "@" sign when using our email adresses.]