Zur Kurzanzeige

Analysis of expression profile and gene variation via development of methods for Next Generation Sequencing data

dc.contributor.advisorBeißbarth, Tim Prof. Dr.
dc.contributor.authorWolff, Alexander
dc.date.accessioned2018-11-27T10:35:01Z
dc.date.available2018-11-27T10:35:01Z
dc.date.issued2018-11-27
dc.identifier.urihttp://hdl.handle.net/11858/00-1735-0000-002E-E517-9
dc.identifier.urihttp://dx.doi.org/10.53846/goediss-7165
dc.identifier.urihttp://dx.doi.org/10.53846/goediss-7165
dc.language.isoengde
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject.ddc510de
dc.titleAnalysis of expression profile and gene variation via development of methods for Next Generation Sequencing datade
dc.typecumulativeThesisde
dc.contributor.refereeWaack, Stephan Prof. Dr.
dc.date.examination2018-11-19
dc.description.abstractengSince the last ten to twenty years, the cost of sequencing the human genome decreased continuously. Therefore the interest in RNA sequencing (RNA-Seq) rose as it can be used to discover the molecular mechanisms behind gene expression profiles of cells in different healthy or disease states (Wang et al., 2009). The intention of this dissertation is two-fold, first identify the best performing bioinformatical methods for RNA-Seq analysis at hand and based on this knowledge generate a standardised work ow, which then could be used within the MetastaSys consortium. Second, answering the question: Is it possible to detect somatic mutations in cancer based on RNA-Seq data reliably? This was of particular interest as the RNA-Seq data was already created for differential gene expression analysis. Getting further information on mutation status without the need to recreate the data for Exome-Seq would, on the one hand, save the expensive costs for Exome-Seq and would, on the other hand, save precious biological material of cancer metastases patients, which are precious to the physicians. For the RNA-Seq work ow identification data based on the microarray and Illumina RNA-Seq platforms were created. Therefore two data sets were created: human patient data from rectal cancer metastases in the liver and human cell lines from Burkitt's Lymphoma, which was stimulated with the B-Cell activating factor BAFF. The advantages of RNA-Seq over Microarray became clear during the comparative analysis in the first publication (see 3.1). The primary focus was the performance evaluation of bioinformatical methods based on the given data sets. The work ow performance was evaluated during the alignment, transcript quantification, differential gene expression analysis, and functional profiling steps. Results showed, that despite the work ow with TopHat2 and Cuffinks, all work ows achieved nearly equally good results with a slight preference for STAR and RSEM, as STAR achieved the overall highest mapping rate and RSEM incorporated multi-mapped reads for quantification and was also capable of quantifying transcript isoforms next to genes. Afterwards, the best performing work ow pipeline was applied to mice in another study (see 3.3). The mice developed metastases in the liver from colorectal cancer. The bioinformatical approach streamlined via the work ow helped a lot in interpreting the biology behind the expression of metastasis enhancing genes. It was possible to show links of metastasis-related genes and their stimulation via the liver environment. These genes were associated with tissue remodelling, cell proliferation, adhesion, wnt activity, transcription/regulation, and inhibition of apoptosis. The question if a reliable identification of somatic mutation is possible in RNA-Seq is tackled by implementing Wileup, a program is written in Perl. Wileup's performance was evaluated against the state-of-the-art somatic variant caller Mutect2 from the GATK tool suite for matched RNA-Seq and Exome-Seq samples of 14 patients with either brain (seven patients) or liver (seven patients) metastases (see 3.2). Results showed that Wileup was capable of finding all somatic mutations in RNA-Seq identified by Mutect2 in Exome-Seq. In contrast, Mutect2 and Wileup identified unique germline mutation only found in either of the methods. These could be explained due to a lack of expression on the RNA-Seq data or due to too high duplication level in the Exome-Seq data. Furthermore, the somatic mutations could be independently validated by pathological annotation data. For the uniquely found germline mutations of either method, it was possible to verify all of them, as they were re-identified in the Exome-sequenced blood samples of the corresponding patients. In conclusion, the presented studies in this thesis contribute towards establishing pipeline standards in transcriptomics, with the focus on differential expression analysis (DEA), and exploring the capabilities of mutation calling in RNA-Seq.de
dc.contributor.coRefereeMorgenstern, Burkhard Prof. Dr.
dc.contributor.thirdRefereeMay, Wolfgang Prof. Dr.
dc.contributor.thirdRefereeWingender, Edgar Prof. Dr.
dc.contributor.thirdRefereeKurth, Winfried Prof. Dr.
dc.subject.engRNA-Seqde
dc.subject.engMutationde
dc.subject.engSNVde
dc.identifier.urnurn:nbn:de:gbv:7-11858/00-1735-0000-002E-E517-9-6
dc.affiliation.instituteFakultät für Mathematik und Informatikde
dc.subject.gokfullInformatik (PPN619939052)de
dc.identifier.ppn1041147309


Dateien

Thumbnail

Das Dokument erscheint in:

Zur Kurzanzeige