Explaining decisions of graph convolutional neural networks for analyses of molecular subnetworks in cancer

Chereda, Hryhorii

dc.contributor.advisor	Beißbarth, Tim Prof. Dr.
dc.contributor.author	Chereda, Hryhorii
dc.date.accessioned	2022-05-03T13:51:48Z
dc.date.available	2022-05-10T00:50:13Z
dc.date.issued	2022-05-03
dc.identifier.uri	http://resolver.sub.uni-goettingen.de/purl?ediss-11858/14025
dc.identifier.uri	http://dx.doi.org/10.53846/goediss-9215
dc.language.iso	eng	de
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject.ddc	570	de
dc.title	Explaining decisions of graph convolutional neural networks for analyses of molecular subnetworks in cancer	de
dc.type	doctoralThesis	de
dc.contributor.referee	Beißbarth, Tim Prof. Dr.
dc.date.examination	2022-02-24	de
dc.description.abstracteng	Contemporary deep learning approaches exhibit state-of-the-art performance in various areas. In healthcare, the application of deep learning remains limited since deep learning methods are often considered as non-interpretable black-box models. However, the Machine Learning (ML) community made recent elaborations to develop the methods of eXplainable Artificial Intelligence (XAI). The explanation methods explain single decisions of an ML model when a single data point is fed into the model’s input. In a clinical setup, a data point can represent a single patient. Data point-specific explanations could possibly assist the need in personalized precision medicine decisions via explaining patient-specific predictions. Convolutional Neural Networks (CNNs) as deep learning methods have been already applied to classify transformed into images gene expression profiles of patients. Gene expression data can be structured by a prior knowledge molecular network (encoded as a graph) representing connections between genes. Each vertex of a molecular network is assigned a gene expression value as an attribute. The set of the attributes creates a graph signal representing a patient. Emerging field of geometric deep learning deals with methods applicable to graph structured data and extends CNNs as Graph Convolutional Neural Networks (GCNNs) classifying graph signals. Layer-wise Relevance Propagation (LRP) is a method to explain decisions of CNNs classifying image data. I extended the LRP method to make it available for GCNNs. Graph Layer-wise Relevance Propagation (GLRP) is presented as a new method to explain single decisions made by a GCNN model. In this thesis, I present a novel methodology generating patient-specific molecular subnetworks as explanations for classification decisions of an ML approach utilizing prior knowledge of molecular networks. GCNN serves as a ML approach, and its decisions are explained by developed GLRP. A sanity check of the developed GLRP method was demonstrated on a hand-written digits dataset. The biological validation was performed by applying the developed methodology to gene expression data from Human Umbilical Vein Endothelial Cells (HUVEC) treated or not treated with tumor necrosis factor alpha. To show the utility of introduced methodology in the scopes of precision medicine, it was applied to a large breast cancer dataset. The generated patient-specific subnetworks largely agree with clinical knowledge and could assist precision medicine approaches by identifying common as well as novel, and potentially druggable, drivers of tumor progression. Apart from generating patient-specific subnetworks, the developed methodology can be used as a general feature selection approach. The outcome of a feature selection approach is a subset of important for classification genes corresponding to a whole dataset. It is essential to sustain stability of selected feature subsets across different datasets with the same clinical endpoint since the selected genes are possible candidates for prognostic biomarkers. I analysed the stability of feature selection performed by GCNN+LRP. I have implemented a graph convolutional layer of GCNN as a Keras layer so that the SHapley Additive exPlanations (SHAP) method could be also applied to a Keras version of a GCNN model. The stability of feature selection performed by GCNN+LRP was compared to the stability of GCNN+SHAP and other ML-based feature selection approaches. The GCNN+LRP approach shows the highest stability. GCNN+LRP subnetworks were compared to GCNN+SHAP subnetworks in terms of connectivity and permutation feature importance. While GCNN+SHAP subnetworks demonstrate higher permutation importance than GCNN+LRP subnetworks, a GCNN+LRP subnetwork of an individual patient is on average substantially more connected and, therefore more interpretable in the context of prior knowledge than a GCNN+SHAP subnetwork which consists mainly of single vertices.	de
dc.contributor.coReferee	Söding, Johannes Dr.
dc.subject.eng	Gene expression, Explainable artificial intelligence, Precision medicine, Molecular networks	de
dc.identifier.urn	urn:nbn:de:gbv:7-ediss-14025-6
dc.affiliation.institute	Biologische Fakultät für Biologie und Psychologie	de
dc.subject.gokfull	Biologie (PPN619462639)	de
dc.description.embargoed	2022-05-10	de
dc.identifier.ppn	1800989954

Dateien

Name:Chereda_Dissertation_Publication.pdf

Größe:4.110Mb

Format:PDF

Öffnen

Name:: Chereda_Dissertation_Publicati ...
Größe:: 4.110Mb
Format:: PDF

Öffnen

Das Dokument erscheint in:

Fakultät für Biologie und Psychologie (inkl. GAUSS) [1621]

Zur Kurzanzeige