| dc.description.abstracteng | Modern genomic methods such as genotyping and sequencing, enhanced with biostatistical
approaches like imputation have provided insight into the genetic architecture of traits of economic
importance, as well as the traits associated with animal health. These traits often have a complex
nature, with many variants of small effects that impact the trait expression. To associate the effect
of the variant with a trait of interest, genome-wide association studies (GWAS) have been
developed. Despite the success of GWAS in the identification of candidate genes and regions,
causal variants in dairy cattle are still mostly unknown. To be able to capture the possible
association of a variant on a trait, and get more insight into genetic architecture, large samples are
usually needed. Another complication is the existence of long linkage disequilibrium (LD) in
cattle, which makes it challenging to decipher the true association between a variant and a trait.
Post-GWAS analyses that help reveal the true association such as fine-mapping and functional
annotation using the external information about the variant’s function have been implemented
lately. These will become even more important in the future, with ever-increasing sample sizes,
leading to the discovery of more associations, to prioritize the large number of significant
associations obtained through GWAS. In this work, I aimed to infer the novel trait-specific and
trait-shared variants and genes associated with 36 complex traits of substantial economic and
health importance in German Holstein, the most important dairy breed worldwide.
The general introduction Chapter 1 starts with the history of cattle domestication and selection
and describes how these impacted cattle’s LD and effective population size. This is continued with
a discussion on breeding goals in modern dairy cattle breeding with specific attention on the
German Holstein breed. The current state of knowledge of GWAS and the use of genotype and
sequence data in dairy cattle genomics are further described. Finally, the application of GWAS
follow-up methods such as fine-mapping and functional annotation are discussed. This thesis
consists of three manuscripts that are presented in Chapters 2, 3, and 4. Chapter 2 describes the
imputation of 180,217 German Holstein cows to sequence level, and subsequent GWAS for milk,
fat, and protein yield – one of the most important traits in dairy cattle breeding. I also present
Bayesian-based fine-mapping of genome-wide significant signals and subsequent functional
annotation using the external multi-omics information about the variant’s function. The use of
large-scale GWAS in our study led to the identification of tens of thousands of significant variants,
across all traits. With fine-mapping, I prioritized the potential causal variants among all significant
signals, and with functional annotation further narrowed down the potential causal signals.
Eventually, genetic variance explained by novel candidate variants was estimated, leading to the
unique list of variants with a high potential for causality. Additionally, I present the computational
challenges faced when dealing with large datasets and assess the performance of different GWAS
software. Lately, more emphasis has been put on animal health and traits addressing them. These
were often neglected in the past when breeding goals were oriented toward production exclusively.
Consequently, these declined due to intense selection for milk yield over the years. To research
the genetic architecture of the most common diseases in cattle, in Chapter 3 I present the GWAS
and Bayesian fine-mapping results of 11 health traits that are part of today’s balanced breeding
goal in Germany. Large numbers of novel candidate variants and genes were discovered, of which
many were in common across 11 researched traits, as well as with other non-researched traits,
presenting the opportunity for inclusion in selection programs on different health traits. Chapter
4 reports the findings of a huge meta-analysis of health and conformation traits in hundreds of
thousands of cows, leading to the discovery of shared genomic regions between different traits and
groups of traits. This is important for multiple breeding scheme goals, as the majority of the
breeding programs nowadays are conceived in a way that they take different groups of traits
including animal health, conformation, fertility, and production into account. Finally, in Chapter
5 I evaluate the findings from the manuscripts presented in Chapters 2, 3, and 4 and propose their
potential application in modern dairy cattle breeding. Since the biggest advantage of our study
which led to the discovery of new associations was the size of the datasets, I also discuss the merits
and challenges when working with large data, in both computational and data evaluation ways. | de |