Zur Kurzanzeige

Development of novel analysis and data integration systems to understand human gene regulation

dc.contributor.advisorBonn, Stefan Prof. Dr.
dc.contributor.authorRahman, Raza-Ur
dc.date.accessioned2018-06-27T08:52:31Z
dc.date.available2018-06-27T08:52:31Z
dc.date.issued2018-06-27
dc.identifier.urihttp://hdl.handle.net/11858/00-1735-0000-002E-E436-B
dc.identifier.urihttp://dx.doi.org/10.53846/goediss-6924
dc.language.isoengde
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject.ddc510de
dc.titleDevelopment of novel analysis and data integration systems to understand human gene regulationde
dc.typecumulativeThesisde
dc.contributor.refereeBonn, Stefan Prof. Dr.
dc.date.examination2018-05-08
dc.description.abstractengAbstract: This thesis covers a very broad range of bioinformatics methods ranging from the development of the analysis pipeline to the data integration and development of an expression atlas (database and web application development). In addition, an in silico method was developed to annotate genome with novel features, and predicting diseases based on the expression profiles. Development of online analysis of small RNA sequencing data: Small RNA (sRNA) are biomolecules that play important roles in organismal health and disease; as such, sRNA dysregulation can cause severe diseases. The modern method of choice for sRNA expression profiling is sRNA sequencing (sRNA-seq). There are several sRNA-seq analysis platforms available that differ in their analysis portfolio, performance, and user-friendliness. However, these analysis platforms lack one or more important features such as disease biomarkers identification, detection of viral and bacterial infections in sRNA-seq samples, storage of novel predicted miRNAs, multivariate differential expression(DE) analysis and automated submission of jobs via an application programming interface (API). To this end, we developed an online analysis tool called as Oasis 2, a fast and flexible web application which provide many different sRNA-seq analysis options on a single platform. Its major functionalities include quantification of different sRNA species, multivariate differential expression (DE), identification of biomarkers for disease, prediction and storage of novel miRNAs with proper universally accepted nomenclature, identification of infection or contamination, functional/enrichment analysis. Additionally Oasis2 enables users to perform all these different analysis over the web application, as well as over API for automatic submission. Oasis 2 generates downloadable interactive web reports for easy visualization, exploration, and analysis of data on a local system. In future, small RNA editing, modification, and mutation events can be implemented in Oasis 2. Additionally the reported output for bacterial and viral infections and contaminations can be enhanced. Development of small RNA expression atlas (SEA) : As discussed in Section 2 that sRNAs have crucial role in organismal health and disease, yet the number and scope of the currently available sRNA-seq expression repositories are very limited. For example, most of the sRNA-seq repositories support one or two organisms and none of these databases provide search by ontological terms. Considering these shortcomings, we developed sRNA expression atlas (SEA), a data repository to store sRNA expression profiles along with the experimental details such as organism, tissue, cell type, disease, age, gender and technical details like sequencer, kit and barcode etc. Additionally we built a web application that allows end users to query and visualize sRNA expression profiles in an interactive manner. SEA allows users to search for ontology-based queries, supporting single or combined searches for five pre-defined terms such as organism, tissue, disease, cell type, and cell line across different experiments. Currently it contains expression and meta-information of over 2,500 sRNA-seq samples across 10 organisms. As far as we are aware, SEA is the only sRNA-seq database that supports ontology-based queries. In the future, additional available meta-information such as age, gender, developmental stage, genotype as well as technical experimental details can standardized (connect to ontologies) and the search could be enhanced to allow users to query sRNA expression profiles based on them. Moreover, further sRNA-seq datasets should be incorporated into SEA. Lastly, one can store DE and biomarker prediction results for all the sRNA-seq datasets having at-least two groups (such control and diseased) and make them query-able and comparable across different datasets. Prediction and validation of mutually exclusive splicing of exons : Mutually exclusive splicing of exons (MXEs) is a mechanism of functional gene and protein diversification with important roles in organismal development and diseases, such as in SNAP-25 as part of the neuroexocytosis machinery. Additionally mutations in MXEs have been shown to cause diseases such as Timothy syndrome (missense mutation in the CACNA1C gene). Despite their important roles, the current knowledge of human MXEs is very limited, that is to say, that the human genome annotation (Gen-Bank v. 37.3) contains only 158 MXEs in 79 protein-coding genes. To this end, an in silco method was developed to predict MXEs based on sequence similarity, similar lengths, and reading frame conservation; predicted MXEs were validated using the publicly available billions of RNA-seq reads. Based on this method the current knowledge of human MXEs is increased by almost an order of magnitude from 158 to 1,399 MXEs. These MXEs shows tissue and developmental stage specific expression and also have potential roles in diseases. As a heuristic approach was used for the prediction of MXEs in this thesis, in the future a machine learning approach can be used for the prediction of MXEs, which may increase the predicting power of the method and could result in further novel MXEs.de
dc.contributor.coRefereeBeißbarth, Tim Prof. Dr.
dc.contributor.thirdRefereeDamm, Carsten Prof. Dr.
dc.contributor.thirdRefereeMorgenstern, Burkhard Prof. Dr.
dc.contributor.thirdRefereeWörgötter, Florentin Prof. Dr.
dc.subject.engin silicode
dc.subject.engannotate genomede
dc.subject.engdata integrationde
dc.subject.engdata analysisde
dc.subject.enggene regulationde
dc.subject.engdiseasesde
dc.identifier.urnurn:nbn:de:gbv:7-11858/00-1735-0000-002E-E436-B-6
dc.affiliation.instituteFakultät für Mathematik und Informatikde
dc.subject.gokfullInformatik (PPN619939052)de
dc.identifier.ppn1025359526


Dateien

Thumbnail

Das Dokument erscheint in:

Zur Kurzanzeige