Investigating the Importance of DNA Sequence for Nucleosome Positioning with Means of Machine Learning
Doctoral thesis
Date of Examination:2024-01-23
Date of issue:2024-07-09
Advisor:Prof. Dr. Tim Beißbarth
Referee:Prof. Dr. Tim Beißbarth
Referee:Prof. Dr. Argyris Papantonis
Persistent Address:
http://resolver.sub.uni-goettingen.de/purl?ediss-11858/15358
Files in this item
Name:SahrhageMalte_PhD_Thesis.pdf
Size:5.91Mb
Format:PDF
Abstract
English
Nucleosomes are protein complexes condensing DNA molecules to tight chromatin bun- dles to fit into the nucleus. The positioning of nucleosomes on the DNA mediates access to positions for DNA-binding proteins or enzymes and is involved in the regulation of RNA polymerase II kinetics. Thus, chromatin accessibility is a crucial determinant in cell-identity and the regulation of transcription. The influences that affect nucleosome positioning are manifold and in vivo locations of nucleosomes are not strictly determin- istic. Active chromatin remodeling complexes use energy-driven mechanisms to move nucleosomes, and the histone sub-units of nucleosomes can carry biochemical modifica- tions to change local nucleosome positioning dynamics. In this work, however, I deal with the influence of static DNA sequence for the positioning of nucleosomes. There has been evidence for a DNA-intrinsic pattern to support nucleosome binding. This pattern is made up of a symmetrical, repetitive sequence of DNA that mediates a flexible local structure to fit the nucleosome. The existence of such a preference and the availability of high-throughput genomics data makes it possible to use machine learning to predict nucleosome positions from sequence. The modeling of this objective introduces many questions in terms of classification method, interpretability and how explicitly the pro- posed structural preferences should be modeled as an input. In this thesis, I leverage machine learning techniques to model nucleosome positioning from DNA sequence. I explore the construction of an effective classifier through various datasets and architectures. Demonstrating the utility of this tool, I establish a straight- forward resource for assessing the impact of DNA sequence on different regions of the human genome. I explore the most relevant steps in transcription and demonstrate where nucleosomes are partly determined by sequence, where they are positioned independently by other processes or where they are intrinsically repelled. I put these results into the context of dynamical nucleosome shifting by overriding these sequence influences and the consequences for RNAPII transcription speed and accuracy.
Keywords: Random Forest; Machine Learning; Nucleosome Positioning; Nucleosomes; Transcription; Convolutional Neural Networks