Development of algorithms and next-generation sequencing data workflows for the analysis of gene regulatory networks
by Orr Shomroni
Date of Examination:2017-03-02
Date of issue:2017-04-11
Advisor:Dr. Stefan Bonn
Referee:Dr. Stefan Bonn
Referee:Prof. Dr. Stephan Waack
Files in this item
Name:Thesis_opt.pdf
Size:5.19Mb
Format:PDF
Abstract
English
Unraveling genetic and epigenetic mechanisms behind various biological processes is possible with Next generation sequencing (NGS) methodologies, with a multitude of tools developed to analyze such data. Nevertheless, automated, robust and flexible workflows that analyze NGS data quickly and efficiently have been lacking. In addition, given that many NGS studies today involve integration of results from multiple resources in order to better understand complex biological mechanisms, the quick generation of primary results from separate NGS studies will allow researchers to focus on the result integration. As such, the development of such automated workflows is essential in order to analyse multiple datasets of the same type quickly and efficiently. In addition to the implementation of analysis workflows, the lack of an efficient tool for fragment size estimation and enrichment testing of chromatin immunoprecipitation sequencing (ChIP-seq) data brought the necessity to develop such a tool, and so the R package chequeR was implemented and integrated into the \gls{chip-seq} workflow. The workflows developed for ChIP-seq, methylated DNA immunoprecipitation sequencing (MedIP-seq) and RNA-sequencing (RNA-seq) data were generated as automated scripts to integrate various analysis tools together in order to analyze datasets and return primary results. Having such workflows may allow users to generate said results with relative ease and use them in an integrative manner to establish regulatory networks between multiple genomic and epigenomic elements. This point is demonstrated in Chapters 5 and 6, where the former chapter discusses a study on the effect of short- and long-term memory on the epigenetic and genetic mechanisms in the mouse brain, while the latter chapter explains how the role of p73 in multiciliogenesis regulation was determined. With those workflows used in two particular case studies involving integration of various NGS data types, the importance of having reproducible, automated workflows to generate primary results quickly and simply, while allowing researchers to focus on the main integrative aspects of the studies, is displayed.
Keywords: bioinformatics; ChIP-seq; NGS; RNA-seq; MeDIP-seq; Workflow; GRN; memory; p73; ciliogenesis; epigenetics