Analysis of Transcription Factor Binding and Cancer Stage-Specific Regulatory Dynamics Using DNA-Binding Domain Annotations
by Cigdem Hazal Timucin
Date of Examination:2025-05-08
Date of issue:2025-07-31
Advisor:Prof. Dr. Tim Beißbarth
Referee:Prof. Dr. Tim Beißbarth
Referee:Prof. Dr. Argyris Papantonis
Files in this item
Name:CigdemHazalTimucin_Dissertation_NoCV_DOIadded.pdf
Size:8.43Mb
Format:PDF
Abstract
English
Transcription Factors (TFs) regulate gene expression by binding to specific DNA sequences through their DNA-Binding Domain (DBD)s. Accurate modeling of TF–DNA interactions is essential for uncovering the mechanisms of gene regulation. This thesis introduces TFClassPredict, a deep learning model built on a fine-tuned Bidirectional Encoder Representations from Transformers (BERT) architecture that classifies DNA sequences into TF groups defined by shared DBD features across different hierarchical levels. TFClassPredict integrates TF structural properties, derived from their DNA-binding domains, with DNABERT’s DNA language modeling capabilities to yield robust binding predictions. The first part of the thesis covers the design, training, and evaluation of the model architecture. TFClassPredict is trained on DNA sequences annotated with structural DBD groupings and demonstrates strong classification performance across both broad and more specific hierarchical levels. This structure-based approach reduces redundancy and enables more reliable prediction of TF activity across diverse genomic contexts. The model’s strong performance highlights that TFs sharing similar DNA-binding domains also tend to exhibit similar binding site preferences. In the second part of the thesis, TFClassPredict is applied to chromatin accessibility profiles derived from cancer samples. The model is used to analyze regulatory activity in promoter and enhancer regions and relate TF activity patterns to clinical features such as cancer stage and survival. Several TFClasses show distinct patterns between early and late-stage tumors, relating predicted regulatory activity to stages of cancer progression. The analysis revealed distinct regulatory patterns between promoter and enhancer regions, highlighting differences in TF activity across these elements. This analysis also uncovered distinct TF activity profiles associated with early- and late-stage cancers, suggesting stage-specific shifts in regulatory dynamics. These results demonstrate that TFClass-based predictions can distinguish regulatory programs linked to tumor development, highlighting their potential for characterizing disease progression. The thesis offers a framework for predicting regulatory activity based on shared structural features of TFs and their sequence preferences. The approach presented in this thesis offers potential value in clinical research. By linking structural TF activity profiles to disease states, the model may support biomarker discovery, patient stratification, and deciphering regulatory patterns from clinical sequencing data. TFClassPredict’s predictive capabilities could be applied to build tools for early-stage diagnostics or tailored treatment approaches.
Keywords: Transcription Factors; DNA-Binding Domains; DNABERT; Cancer
