Analysis of expression profile and gene variation via development of methods for Next Generation Sequencing data

Wolff, Alexander

dc.contributor.advisor	Beißbarth, Tim Prof. Dr.
dc.contributor.author	Wolff, Alexander
dc.date.accessioned	2018-11-27T10:35:01Z
dc.date.available	2018-11-27T10:35:01Z
dc.date.issued	2018-11-27
dc.identifier.uri	http://hdl.handle.net/11858/00-1735-0000-002E-E517-9
dc.identifier.uri	http://dx.doi.org/10.53846/goediss-7165
dc.identifier.uri	http://dx.doi.org/10.53846/goediss-7165
dc.language.iso	eng	de
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject.ddc	510	de
dc.title	Analysis of expression profile and gene variation via development of methods for Next Generation Sequencing data	de
dc.type	cumulativeThesis	de
dc.contributor.referee	Waack, Stephan Prof. Dr.
dc.date.examination	2018-11-19
dc.description.abstracteng	Since the last ten to twenty years, the cost of sequencing the human genome decreased continuously. Therefore the interest in RNA sequencing (RNA-Seq) rose as it can be used to discover the molecular mechanisms behind gene expression profiles of cells in different healthy or disease states (Wang et al., 2009). The intention of this dissertation is two-fold, first identify the best performing bioinformatical methods for RNA-Seq analysis at hand and based on this knowledge generate a standardised work ow, which then could be used within the MetastaSys consortium. Second, answering the question: Is it possible to detect somatic mutations in cancer based on RNA-Seq data reliably? This was of particular interest as the RNA-Seq data was already created for differential gene expression analysis. Getting further information on mutation status without the need to recreate the data for Exome-Seq would, on the one hand, save the expensive costs for Exome-Seq and would, on the other hand, save precious biological material of cancer metastases patients, which are precious to the physicians. For the RNA-Seq work ow identification data based on the microarray and Illumina RNA-Seq platforms were created. Therefore two data sets were created: human patient data from rectal cancer metastases in the liver and human cell lines from Burkitt's Lymphoma, which was stimulated with the B-Cell activating factor BAFF. The advantages of RNA-Seq over Microarray became clear during the comparative analysis in the first publication (see 3.1). The primary focus was the performance evaluation of bioinformatical methods based on the given data sets. The work ow performance was evaluated during the alignment, transcript quantification, differential gene expression analysis, and functional profiling steps. Results showed, that despite the work ow with TopHat2 and Cuffinks, all work ows achieved nearly equally good results with a slight preference for STAR and RSEM, as STAR achieved the overall highest mapping rate and RSEM incorporated multi-mapped reads for quantification and was also capable of quantifying transcript isoforms next to genes. Afterwards, the best performing work ow pipeline was applied to mice in another study (see 3.3). The mice developed metastases in the liver from colorectal cancer. The bioinformatical approach streamlined via the work ow helped a lot in interpreting the biology behind the expression of metastasis enhancing genes. It was possible to show links of metastasis-related genes and their stimulation via the liver environment. These genes were associated with tissue remodelling, cell proliferation, adhesion, wnt activity, transcription/regulation, and inhibition of apoptosis. The question if a reliable identification of somatic mutation is possible in RNA-Seq is tackled by implementing Wileup, a program is written in Perl. Wileup's performance was evaluated against the state-of-the-art somatic variant caller Mutect2 from the GATK tool suite for matched RNA-Seq and Exome-Seq samples of 14 patients with either brain (seven patients) or liver (seven patients) metastases (see 3.2). Results showed that Wileup was capable of finding all somatic mutations in RNA-Seq identified by Mutect2 in Exome-Seq. In contrast, Mutect2 and Wileup identified unique germline mutation only found in either of the methods. These could be explained due to a lack of expression on the RNA-Seq data or due to too high duplication level in the Exome-Seq data. Furthermore, the somatic mutations could be independently validated by pathological annotation data. For the uniquely found germline mutations of either method, it was possible to verify all of them, as they were re-identified in the Exome-sequenced blood samples of the corresponding patients. In conclusion, the presented studies in this thesis contribute towards establishing pipeline standards in transcriptomics, with the focus on differential expression analysis (DEA), and exploring the capabilities of mutation calling in RNA-Seq.	de
dc.contributor.coReferee	Morgenstern, Burkhard Prof. Dr.
dc.contributor.thirdReferee	May, Wolfgang Prof. Dr.
dc.contributor.thirdReferee	Wingender, Edgar Prof. Dr.
dc.contributor.thirdReferee	Kurth, Winfried Prof. Dr.
dc.subject.eng	RNA-Seq	de
dc.subject.eng	Mutation	de
dc.subject.eng	SNV	de
dc.identifier.urn	urn:nbn:de:gbv:7-11858/00-1735-0000-002E-E517-9-6
dc.affiliation.institute	Fakultät für Mathematik und Informatik	de
dc.subject.gokfull	Informatik (PPN619939052)	de
dc.identifier.ppn	1041147309

Dateien

Name:Thesis_20180928_opt.pdf

Größe:7.208Mb

Format:PDF

Beschreibung:PhD Thesis Alexander ...

Öffnen

Name:: Thesis_20180928_opt.pdf
Größe:: 7.208Mb
Format:: PDF
Beschreibung:: PhD Thesis Alexander Wolff

Öffnen

Das Dokument erscheint in:

Fakultät für Mathematik und Informatik (inkl. GAUSS) [518]

Zur Kurzanzeige