From annotation to bacterial data models
by Tiago Pedreira
Date of Examination:2022-06-10
Date of issue:2022-06-24
Advisor:Prof. Dr. Jörg Stülke
Referee:Prof. Dr. Burkhard Morgenstern
Referee:Dr. Johannes Söding
Files in this item
Name:Dissertation_TiagoPedreira_2022.pdf
Size:8.14Mb
Format:PDF
Abstract
English
Science and technological advancements walk side-by-side and with the recent emergence of novel high throughput techniques, the necessity to have specialized data structures to host and represent the complex and high variety of information is evident. Biological databases address this major constraint and in our research group there is the focus to create these platforms to support the scientific community. Among many, SubtiWiki is seen in the community as the golden standard of biological databases for the model organism Bacillus subtilis. This platform has seen its data increase in size and quality, with highly curated information and more features to represent it. With a growing viewership, SubtiWiki consolidates its position among scientists by providing with novel ways to identify potential protein homologs among relatives and by integrating the popular Cluster of Ortholog Genes database. Recently, SynWiki, a biological database that shares the same framework as SubtiWiki was created and built to integrate data of the new synthetic organism with a minimal genome, JCVI- syn3A. Regardless of the amount of information available for both organisms, here it was shown that using the same framework is possible to expand beyond a single organism’s data structure and use it for multiple organisms. Furthermore, the current state of development of this framework was evaluated, assessing its limitations in maintainability and present a novel framework that will serve as the future of all platforms created in by the research group. This framework, CoreWiki, was created using Flask, a minimal Python framework, that allows a modular development. Finally, the current database schema was evaluated and introduced a refreshing new one that is able to establish more robust and better relationships between the biological elements. Here, a solid contribution to all scientific fields was shown, by providing with a framework ready to integrate information from multiple levels and different organisms. Its aim is to not only organise, but to integrate the data so that every scientist accessing such platforms is able to postulate new hypotheses and take their research to new heights.
Keywords: Bioinformatics; Databases; SubtiWiki; CoreWiki; SynWiki; Python; Flask