Graph based fusion of high-dimensional gene- and microRNA expression data
von Stephan Gade
Datum der mündl. Prüfung:2012-12-10
Betreuer:Prof. Dr. Tim Beißbarth
Gutachter:Prof. Dr. Tim Beißbarth
Gutachter:Prof. Dr. Stephan Waack
EnglischOne of the main goals in cancer studies including high-throughput microRNA (miRNA) and mRNA data is to find and assess prognostic signatures capable of predicting clinical outcome. Both mRNA and miRNA expression changes in cancer diseases are described to reflect clinical characteristics like staging and prognosis. Furthermore, miRNA abundance can directly affect target transcripts and translation in tumor cells. Prediction models are trained to identify either mRNA or miRNA signatures for patient stratification. With the increasing number of microarray studies collecting mRNA and miRNA from the same patient cohort there is a need for statistical methods to integrate or fuse both kinds of data into one prediction model in order to find a combined signature that improves the prediction. Here, we propose a new method to fuse miRNA and mRNA data into one prediction model. Since miRNAs are known regulators of mRNAs, correlations between miRNA and mRNA expression data as well as target prediction information were used to build a bipartite graph representing the relations between miRNAs and mRNAs. Feature selection is a critical part when fitting prediction models to high- dimensional data. Most methods treat features, in this case genes or miRNAs, as independent, an assumption that does not hold true when dealing with combined gene and miRNA expression data. To improve prediction accuracy, a description of the correlation structure in the data is needed. In this work the bipartite graph was used to guide the feature selection and therewith improve prediction results and find a stable prognostic signature of miRNAs and genes. The method is evaluated on a prostate cancer data set comprising 98 patient samples with miRNA and mRNA expression data. The biochemical relapse, an important event in prostate cancer treatment, was used as clinical endpoint. Biochemical relapse coins the renewed rise of the blood level of a prostate marker (PSA) after surgical removal of the prostate. The relapse is a hint for metastases and usually the point in clinical practise to decide for further treatment. A boosting approach was used to predict the biochemical relapse. It could be shown that the bipartite graph in combination with miRNA and mRNA expression data could improve prediction performance. Furthermore the ap- proach improved the stability of the feature selection and therewith yielded more consistent marker sets. Of course, the marker sets produced by this new method contain mRNAs as well as miRNAs. The new approach was compared to two state-of-the-art methods suited for high-dimensional data and showed better prediction performance in both cases.
Keywords: cancer; gene expression; miRNA; boosting; data fusion