Application of imputed sequence-level genotypes to genomic analyses in German Warmblood horses
Cumulative thesis
Date of Examination:2025-01-09
Date of issue:2025-02-20
Advisor:Prof. Dr. Jens Tetens
Referee:Prof. Dr. Jens Tetens
Referee:Prof. Dr. Georg Thaller
Referee:Prof. Dr. Gudrun A. Brockmann
Sponsor:This thesis was supported by the H. Wilhelm Schaumann Stiftung.
Files in this item
Name:eDiss_Paula_Reich.pdf
Size:2.92Mb
Format:PDF
Abstract
English
The use of whole-genome sequence (WGS) data in genomic applications is potentially superior to the commonly used genotype data from single nucleotide polymorphism (SNP) arrays, but their generation on a large scale is expensive. An affordable alternative to obtain sequence-level data for large numbers of individuals is to apply genotype imputation, which is the prediction of genotypes not directly assayed in a study sample. By increasing the marker density, imputation can improve the performance of genomic analyses, provided that its accuracy is sufficiently high. In horses, information on genotype imputation is rather scarce and the availability of genomic data in general and WGS data in particular is limited. Therefore, the aim of this thesis was to implement genotype imputation in German Warmblood horses, a large horse population of global importance, and to apply the resulting imputed sequence-level data to various genomic applications in order to investigate their suitability for such analyses and to identify variants associated with or causal for selected phenotypes of interest. First of all, mapping and variant calling were performed on publicly available WGS data from 317 horses of diverse breeds to establish a reference panel for genotype imputation in horses, which was then used to investigate the effect of several factors on the accuracy of imputation in order to develop an optimal strategy for genotype imputation in warmblood horses. Imputation accuracy was found to be influenced by the size and composition of the reference panel, the marker density of the genotyping array, the minor allele frequency of the imputed markers and the software used for imputation. Based on these findings, an adapted strategy, which resulted in a genome-wide imputation accuracy of 0.66, was developed and used to impute a cohort of 4972 German Warmblood horses from medium SNP density to sequence level. The resulting dataset was used to discover genomic regions associated with equine conformation, which is an important selection criterion in horse breeding and an example of complex traits in horses. Applying genome-wide association studies (GWAS), novel quantitative trait loci (QTL) were detected for various conformation traits. Furthermore, heritabilities and genetic correlations were estimated for all investigated traits. A GWAS for withers height, which served as a reference trait to validate the imputed dataset and methodology, confirmed a previously reported QTL on chromosome 3 near the LCORL and NCAPG genes. Subsequent fine-mapping of the region enabled the identification of candidate causal variants, including a nonsense mutation within the coding sequence of LCORL. The same QTL was also associated with several other conformation traits, and high genetic correlations were observed between these traits for the QTL region, indicating its high relevance for the manifestation of equine conformation in general. The imputed dataset was further used to detect embryonic lethal mutations, which can be identified by screening genomic data for haplotypes or variants without homozygotes. While the small number of sequenced horses in the reference panel did not allow for the discovery of high- or moderate-impact variants with a significant absence or reduction of homozygotes, the increased sample size of the imputed warmblood horses facilitated the identification of 72 such mutations. However, their further characterisation raised doubts about their potential lethality and highlighted that great caution should be taken when using imputed data for this application. Eventually, the most promising candidate mutations turned out to be artefacts from the variant calling pipeline, which were classified as putatively significant in terms of the absence of homozygotes based on the results of genotype imputation. In conclusion, the application of imputed sequence-level data was successful for GWAS and fine-mapping of equine conformation traits and in particular withers height as examples of complex traits in horses, but was less efficient for the identification of embryonic lethal variants based on a deficiency of homozygous individuals. However, the effect of genotype imputation and the associated risk of imputation errors are not easily separated from other factors that affect the performance of genomic analyses, such as sample size. Nevertheless, if handled with care, genotype imputation is a cost-effective means to increase the number of individuals with sequence-level data to be used in genomic applications, especially in a species such as the horse where the availability of real WGS data is comparatively limited.
Keywords: German Warmblood horses; genotype imputation; equine conformation; withers height; genome-wide association studies; genetic parameters; recessive lethal defects