Fast inference in flexible high dimensional models of gene co-expression and genotype-gene expression interaction
Doctoral thesis
Date of Examination:2024-11-28
Date of issue:2025-03-20
Advisor:Dr. Johannes Söding
Referee:Prof. Dr. Heike Bickeböller
Referee:Prof. Dr. Thomas Kneib
Files in this item
Name:morice_coexpression.pdf
Size:1.09Mb
Format:PDF
Abstract
English
As more high-throughput sequencing data is produced from ever-growing human cohorts, understanding human biology becomes a more and more quantitative and statistical task. Starting notably with the data made available in 2020 by the Genotype-Gene Expression project, we undertake a systematic exploration of models to describe gene co-expression patterns, with a new emphasis on probabilistic modeling and predictive out-of-sample performance. We especially generalize some Bayesian linear models and the associated optimization methods to achieve scalability on thousands of genes and flexibility in the model by careful linear algebra manipulations. We then build on these foundation models and demonstrate their downstream usefulness by tackling their integration into state of the art expression quantitative trait loci discovery pipelines. In the process, we re-analyze and improve the control and multiple testing procedures that this objective requires. We eventually demonstrate how better co-expression modeling translates to increased genotype association discovery power, and overall aim to set an example of deep probabilistic integration of omics data in modern bioinformatics pipelines.
Keywords: Gene co-expression; RNA-Seq; eQTL; GTEx; clustered MacKay; CMK