Navigation ▼

Show simple item record

dc.contributor.advisor Bickeböller, Heike Prof. Dr.
dc.contributor.author Friedrichs, Stefanie
dc.date.accessioned 2017-10-12T08:44:21Z
dc.date.available 2017-10-12T08:44:21Z
dc.date.issued 2017-10-12
dc.identifier.uri http://hdl.handle.net/11858/00-1735-0000-0023-3F2D-5
dc.language.iso eng de
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject.ddc 610 de
dc.title Kernel-Based Pathway Approaches for Testing and Selection de
dc.type doctoralThesis de
dc.contributor.referee Kneib, Thomas Prof. Dr.
dc.date.examination 2017-09-25
dc.description.abstracteng With the number of single nucleotide polymorphisms (SNPs) available in genetic data currently constantly increasing, the evaluation of SNP sets has become a successful approach toward elucidating the genetic influence on various complex diseases. The joint investigation of multiple SNPs increases the probability of detecting moderate and weak association signals and bypasses the multiple testing problem inherent to testing procedures on the genome-wide scale. Furthermore, this approach assists in the biological interpretation of analysis results, which may be supported by the analysis of SNP sets representing a pathway, here denoting a set of genes fulfilling a particular biological function jointly. The association between a pathway-representing SNP set and a phenotype may be analysed appropriately with the kernel machine approach. This evaluates the genotypes of multiple SNPs jointly by transforming them into a kernel matrix, comprising the genetic similarity measures for any pair of individuals in the study. The kernel matrix is calculated by a predefined kernel function. Multiple kernel functions have been proposed, some of which are capable of integrating further biological knowledge on a pathway and allow for varying types of effect. The network kernel function enables the direct incorporation of a pathway’s network structure, while at the same time considering additive as well as interaction effects in the investigated SNP set. A multitude of databases are available nowadays offering an increasing amount of biologically meaningful information on pathways, genes, and genetic markers. The initial work in this thesis investigates possibilities and the impact of integrating additional biological information into existing approaches in the analysis of genetic data. The impact of marker density, SNP-set aggregation with respect to linkage disequilibrium structures, and knowledge sources were considered. In this context, the software package kangar00 was developed in R, offering a wide range of functions relating to data download, pre-processing, transformation, and evaluation for single-pathway testing in the logistic kernel machine framework, implemented, and made freely available. The identification of specific biological processes influencing disease risk is still very challenging, despite the integration of growing amounts of biological data. Single-pathway methods cannot usually discriminate causal processes influencing disease susceptibility from isolated genetic effects included in a pathway resulting from gene overlaps. Moreover, they usually lack the ability to predict any trait of interest. The main objective of this thesis is the development of a new method in the evaluation of SNP sets, focussing on the analysis of those representing pathways. The resulting analysis approach enables the mutual investigation of multiple sets of SNPs through the adaptation of a boosting algorithm. Boosting originates from the field of machine learning, in which it was developed as a classification approach. Its main idea is to combine functions with poor classification performance iteratively into a strong classifying set. If the functions considered only depend on a subset of the explanatory variables available, variable selection may be performed while the model is fitted. We made use of this to perform selection on a set of pathways by employing a kernel function dependent on SNP sets representing pathways. Since all pathways of interest are investigated jointly in the boosting algorithm, correlations between them are also considered. We may therefore discriminate biological processes influential on disease susceptibility from single effect genes included in a pathway resulting from gene overlap. Our software package kangar00 includes an interface to a boosting algorithm, together with which all functionalities necessary to apply kernel boosting are available. Thanks to its inherent properties and the freely available software implementation, kernel boosting has great potential to elucidate key biological functions involved in disease risk, while creating a directly interpretable model to predict disease status. de
dc.contributor.coReferee Beißbarth, Tim Prof. Dr.
dc.subject.eng Kernel Approaches de
dc.subject.eng Boosting de
dc.subject.eng GWAS Analysis de
dc.subject.eng SNP-set analysis de
dc.subject.eng Analysis of Case-Control Studies de
dc.subject.eng Integration of biological information de
dc.subject.eng Pathway analysis de
dc.subject.eng Multiple-pathway method de
dc.subject.eng Logistic Regression de
dc.subject.eng Gene-network overlap de
dc.identifier.urn urn:nbn:de:gbv:7-11858/00-1735-0000-0023-3F2D-5-7
dc.affiliation.institute Medizinische Fakultät de
dc.subject.gokfull Medizinische Statistik / Biometrie / Epidemiologie - Allgemein- und Gesamtdarstellungen (PPN619875046) de
dc.identifier.ppn 1002330483

Files in this item

This item appears in the following Collection(s)

Show simple item record