Novel machine learning solutions for precision medicine - from causal inference to cellular perturbations and interactions
Cumulative thesis
Date of Examination:2024-09-25
Date of issue:2024-10-04
Advisor:Prof. Dr. Michael Altenbuchinger
Referee:Prof. Dr. Michael Altenbuchinger
Referee:Prof. Dr. Fabian Sinz
Files in this item
Name:Dissertation_Stefan_Schrod.pdf
Size:10.1Mb
Format:PDF
Abstract
English
Optimal therapy of complex diseases necessitates a shift from one-size-fits-all solutions towards personalized treatment plans. To achieve this, it is crucial to understand the casual effect of medical interventions, such as surgeries or medication, given the unique context of patients. Weighting the expected benefits against adverse effects can be used to make informed individualized therapy recommendations. In this Thesis, we introduce a set of novel methodologies to advance the treatment of complex diseases. Inferring the effect of medical interventions is primarily challenged by the fundamental problem of causal inference, which states that for each individual only a single factual outcome is observable, the outcome of all alternative actions, i.e., the counterfactuals, remain unobservable. In the clinical environment data is commonly recorded in observational trials, leading to biased treatment assignments, and can be subject to right-censoring. To facilitate counterfactual estimates for such complex clinical data we introduce the Balanced Treatment Effect for Survival Data (BITES). We show that BITES outcompetes stat-of-the-art approaches on extensive simulation studies and that BITES can optimize the treatment assignment in breast cancer patients. An important aspect of treatment biased observational data is that the biased information can be highly predictive for the outcome, i.e., eliminating distributional differences trades off with factual accuracy. To prevent over enforcing distributional balance at the cost of factual accuracy we propose Adversarial Distribution Balancing for Counterfactual Reasoning (ADBCR). We show that ADBCR outcompetes state-of-the-art models on three benchmark datasets and propose an extensions to simultaneously adapt to new patients without access to observable outcomes. The treatment of complex diseases requires considering many different treatments in combinations, unfortunately, there is currently not enough data available to decipher such effects on patient level. However, advances in omics technologies facilitate the measurement of large-scale drug perturbation experiments for up to two simultaneously applied interventions at single-cell resolution. Building on our introduced counterfactual inference models, we introduce COunterfactual Deep learning for the in-silico EXploration of cancer cell line perturbations (CODEX), to link combinations of causal effects to their downstream consequences. We show that CODEX facilitates the in-silico exploration of drug combinations and complex genetic modifications for both bulk and single cell data. The performance of Deep Learning models in clinical settings is heavily constrained by the small amounts of available training data. Data is commonly generated at different institutions, subject to different data generating processes and subject to data privacy regulations. To overcome these issues and further enhance the proposed models we introduce Federated Adversarial Cross Training (FACT). FACT facilitates federated access to many heterogeneous data sources and simultaneously eliminates the location dependent statistical differences. We show that FACT outcompetes state-of-the-art methods for multi-source-single-target federated domain adaptation baselines. Advances in experimental techniques not only expand existing data resources but also allow access to novel types of data, such as the spatially resolved profiling of single cells. This provides a window into complex interactions of cells within a tissue and provides invaluable information about gene regulations driving diseases. We introduce Spatial Cellular Networks from omics data (SpaCeNet), to decipher the inter- and intracellular associations using spatial conditional independence relationships. We show that SpaCeNet estimates diverse cellular interactions in in silico generated tissues and real-word cancer tissues. The introduced solutions provide a crucial step towards overcoming challenges in precision medicine and the development clinical decision support systems.
Keywords: Precision Medicine; Machine Learning; Counterfactual Inference; Treatment Effect Estimation; Representation Learning; Molecular Perturbations; Federated Learning; Spatial Transcriptomics; BITES; ADBCR; CODEX; FACT; SpaCeNet