Gene-Environment Interaction and Extension to Empirical Hierarchical Bayes Models in Genome-Wide Association Studies
by Elena Viktorova
Date of Examination:2014-06-17
Date of issue:2014-07-24
Advisor:Prof. Dr. Heike Bickeböller
Referee:Prof. Dr. Tim Beißbarth
Referee:Prof. Dr. Dieter Kube
Referee:Prof. Dr. Tim Friede
Files in this item
Name:Elena Viktorova_Doctoral Dissertation_2014.pdf
Format:PDFDescription:"Doctoral Dissertation Elena Viktorova"
EnglishThere are over 100,000 human diseases of which only around 10,000 are known to be monogenic, resulting from modification in a single gene. Many multifactorial diseases, such as cancer and lung cancer in particular, are outcomes of the interplay between genetic and environmental factors. It is well known that smoking is the major environmental risk factor in lung cancer. In recent years, great progress in genotyping technology and cost control has enabled researchers to perform large-scale association studies, involving thousands of individuals genotyped on millions of markers. To date, genome-wide association studies (GWAS) have identified hundreds of genetic risk factors in complex diseases. However, the detected variants explain only a small part of the total heritability. Unexplained phenotypic variance may be partly attributed to undetected gene-environment (G×E) interactions. Therefore, there has been a rapid evolution in the development of statistical tools to discover biologically credible G×E interactions in a genome-wide context. The analysis of G×E interactions remains one of the greatest challenges in the post-genome-wide-association-studies era. Uncovered population stratification in large association and interaction studies may lead to false positive results or masks true signals via under (over)-estimation of the true effects. In this dissertation, we began by evaluating the robustness or the magnitude of the bias due to population stratification in case-control studies of G×E interaction. A simple equation was derived to measure the population stratification bias of the interaction effect for the case-control estimator of G×E interaction. Another great challenge to G×E interaction research remains the ability to maintain adequate power, while accounting for gene-environment (G-E) correlation in the source population. G-E correlation occurs when exposure to the environmental condition depends on the individual’s genotype or vice versa, irrespective of the disease status of that individual. The empirical hierarchical Bayes approach to G×E interaction (EHB-GECHI) benefits from greater power than the classical case-control test, while accounting for population based G-E correlation. We developed extensions of EHB-GECHI with respect to covariate adjustment, general exposure and genotype and to performance under an additive mode of inheritance. In this dissertation, we finally introduce an alternative to EHB-GECHI which is computationally more efficient, using a more stable model to obtain the posterior estimates of G-E correlation in controls. Incorporating a parametric Bayes inference framework, with a normal distribution in a hierarchical model, we developed an approach that corrects for G-E correlations, gathering information across all markers simultaneously (as does EHB-GECHI). We named it the empirical hierarchical Bayes approach for G×E interaction EHB-GENN. Our simulation study demonstrates that EHB-GENN controls type I error better than EHB-GECHI while remaining powerful. The last objective of this thesis is to consider the joint tests for genetic marginal and G×E interaction effects. Previous studies suggest that G×E interactions might help to detect genetic variants missed by a test for association with main effects. Specifically, some SNPs may have a moderate genetic and a G×E interaction effect and thus joint tests for marginal association and G×E interaction were developed to gain additional power over tests of main effects. Here we present how EHB-GENN can be adopted for joint testing, resulting in the EHB-GENNJ test. The application of EHB-GENN and joint tests on four lung cancer GWASs from the ILCCO/TRICL consortia is presented and the results are discussed. We detected known markers for lung cancer, e.g. rs1051730 in CHRNA3, rs8034191 in AGPHD1 and suggestive signals, e.g. rs7982922 in ENOX1, rs2736100 in TERT, applying joint tests, using either case-control, case-only, MUK-EB or EHB-GENN for the G×E interaction component.
Keywords: gene-environment interaction; genome-wide association studies; empirical Bayes; lung cancer