Bioinformatic analyses of complex genomic regions in non-human primate genomes
Doctoral thesis
Date of Examination:2025-02-28
Date of issue:2025-12-10
Advisor:Prof. Dr. Lutz Walter
Referee:Prof. Dr. Rolf Daniel
Referee:Prof. Dr. Jörg Stülke
Files in this item
Name:PhD_thesis_LauraAhumadaArranz_FINAL.pdf
Size:75.6Mb
Format:PDF
Description:Full PhD thesis
Abstract
English
This PhD thesis investigates the application of bioinformatics techniques to annotate and explore complex genomic regions, with an emphasis on the immune-related Natural Killer Group 2 (NKG2) genes in non-human primates. NKG2 genes are receptors of NK cells with a crucial role in identifying and eliminating infected, cancerous or foreign cells. These receptors are regulated through their interaction with Major Histocompatibility Complex (MHC) class I proteins, which are part of a highly polymorphic, gene-rich region governing both innate and adaptive immune responses. The highly repetitive nature of these complex genomic regions pose significant challenges in their study and require ongoing improvements in sequencing technologies to increase assembly quality and bioinformatic tools to analyze the resulting assemblies. This project aims to further contribute to the characterization of these regions with the development of a bioinformatics annotation pipeline, AnnCX, specialized for complex regions. This pipeline, through the implementation of multiple individual annotation tools, is shown to overall increase annotation accuracy over the widely used MAKER annotation pipeline in our benchmarking study. AnnCX was used to unravel the NKG2 genomic diversity in the latest high-quality whole-genome assemblies of non-human primates produced with third-generation sequencing technologies, kindly provided by the authors of the Primate Genome Project (Wu et al., 2022). For many of these primates, this represented the first time that these complex regions have been sequenced and, therefore, explored. Our results revealed that the 36 species studied possessed single NKG2A and F genes flanking a species-specific variable number of NKG2C, CE and/or E genes, with only NKG2A predicted to have an inhibitory function. NKG2F, which has so far been conceived as a pseudogene or truncated protein, here we predicted as a potential full-length receptor, with a putative C-type lectin-like domain, in 14 species of non-human primates. In Macaca mulatta, we predicted intra-species allelic variation in the length of its NKG2F and the potential interaction between its full-length variant and the binding partners of most NKG2 receptors, CD94 and Mamu-E. The findings from this study hopefully contribute to generate a more comprehensive phylogenetic overview of NKG2 genes and provide improved annotation templates to be used in research in biomedicine and evolutionary biology.
Keywords: gene annotation pipeline; complex genomic regions; non-human primate genomics; gene copy-number variation; MHC; population genomics; NKG2; NKG2F; AnnCX; Primate Genome Project
