Zur Kurzanzeige

Kernel-Based Pathway Approaches for Testing and Selection

dc.contributor.advisorBickeböller, Heike Prof. Dr.
dc.contributor.authorFriedrichs, Stefanie
dc.date.accessioned2017-10-12T08:44:21Z
dc.date.available2017-10-12T08:44:21Z
dc.date.issued2017-10-12
dc.identifier.urihttp://hdl.handle.net/11858/00-1735-0000-0023-3F2D-5
dc.identifier.urihttp://dx.doi.org/10.53846/goediss-6520
dc.language.isoengde
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject.ddc610de
dc.titleKernel-Based Pathway Approaches for Testing and Selectionde
dc.typedoctoralThesisde
dc.contributor.refereeKneib, Thomas Prof. Dr.
dc.date.examination2017-09-25
dc.description.abstractengWith the number of single nucleotide polymorphisms (SNPs) available in genetic data currently constantly increasing, the evaluation of SNP sets has become a successful approach toward elucidating the genetic influence on various complex diseases. The joint investigation of multiple SNPs increases the probability of detecting moderate and weak association signals and bypasses the multiple testing problem inherent to testing procedures on the genome-wide scale. Furthermore, this approach assists in the biological interpretation of analysis results, which may be supported by the analysis of SNP sets representing a pathway, here denoting a set of genes fulfilling a particular biological function jointly. The association between a pathway-representing SNP set and a phenotype may be analysed appropriately with the kernel machine approach. This evaluates the genotypes of multiple SNPs jointly by transforming them into a kernel matrix, comprising the genetic similarity measures for any pair of individuals in the study. The kernel matrix is calculated by a predefined kernel function. Multiple kernel functions have been proposed, some of which are capable of integrating further biological knowledge on a pathway and allow for varying types of effect. The network kernel function enables the direct incorporation of a pathway’s network structure, while at the same time considering additive as well as interaction effects in the investigated SNP set. A multitude of databases are available nowadays offering an increasing amount of biologically meaningful information on pathways, genes, and genetic markers. The initial work in this thesis investigates possibilities and the impact of integrating additional biological information into existing approaches in the analysis of genetic data. The impact of marker density, SNP-set aggregation with respect to linkage disequilibrium structures, and knowledge sources were considered. In this context, the software package kangar00 was developed in R, offering a wide range of functions relating to data download, pre-processing, transformation, and evaluation for single-pathway testing in the logistic kernel machine framework, implemented, and made freely available. The identification of specific biological processes influencing disease risk is still very challenging, despite the integration of growing amounts of biological data. Single-pathway methods cannot usually discriminate causal processes influencing disease susceptibility from isolated genetic effects included in a pathway resulting from gene overlaps. Moreover, they usually lack the ability to predict any trait of interest. The main objective of this thesis is the development of a new method in the evaluation of SNP sets, focussing on the analysis of those representing pathways. The resulting analysis approach enables the mutual investigation of multiple sets of SNPs through the adaptation of a boosting algorithm. Boosting originates from the field of machine learning, in which it was developed as a classification approach. Its main idea is to combine functions with poor classification performance iteratively into a strong classifying set. If the functions considered only depend on a subset of the explanatory variables available, variable selection may be performed while the model is fitted. We made use of this to perform selection on a set of pathways by employing a kernel function dependent on SNP sets representing pathways. Since all pathways of interest are investigated jointly in the boosting algorithm, correlations between them are also considered. We may therefore discriminate biological processes influential on disease susceptibility from single effect genes included in a pathway resulting from gene overlap. Our software package kangar00 includes an interface to a boosting algorithm, together with which all functionalities necessary to apply kernel boosting are available. Thanks to its inherent properties and the freely available software implementation, kernel boosting has great potential to elucidate key biological functions involved in disease risk, while creating a directly interpretable model to predict disease status.de
dc.contributor.coRefereeBeißbarth, Tim Prof. Dr.
dc.subject.engKernel Approachesde
dc.subject.engBoostingde
dc.subject.engGWAS Analysisde
dc.subject.engSNP-set analysisde
dc.subject.engAnalysis of Case-Control Studiesde
dc.subject.engIntegration of biological informationde
dc.subject.engPathway analysisde
dc.subject.engMultiple-pathway methodde
dc.subject.engLogistic Regressionde
dc.subject.engGene-network overlapde
dc.identifier.urnurn:nbn:de:gbv:7-11858/00-1735-0000-0023-3F2D-5-7
dc.affiliation.instituteMedizinische Fakultätde
dc.subject.gokfullMedizinische Statistik / Biometrie / Epidemiologie - Allgemein- und Gesamtdarstellungen (PPN619875046)de
dc.identifier.ppn1002330483


Dateien

Thumbnail

Das Dokument erscheint in:

Zur Kurzanzeige