dc.description.abstracteng | Since the last ten to twenty years, the cost of sequencing the human genome decreased
continuously. Therefore the interest in RNA sequencing (RNA-Seq) rose as it can be used
to discover the molecular mechanisms behind gene expression profiles of cells in different
healthy or disease states (Wang et al., 2009). The intention of this dissertation is two-fold,
first identify the best performing bioinformatical methods for RNA-Seq analysis at hand
and based on this knowledge generate a standardised work
ow, which then could be used
within the MetastaSys consortium. Second, answering the question: Is it possible to detect
somatic mutations in cancer based on RNA-Seq data reliably? This was of particular interest
as the RNA-Seq data was already created for differential gene expression analysis. Getting
further information on mutation status without the need to recreate the data for Exome-Seq
would, on the one hand, save the expensive costs for Exome-Seq and would, on the other
hand, save precious biological material of cancer metastases patients, which are precious to
the physicians.
For the RNA-Seq work
ow identification data based on the microarray and Illumina RNA-Seq
platforms were created. Therefore two data sets were created: human patient data from
rectal cancer metastases in the liver and human cell lines from Burkitt's Lymphoma, which
was stimulated with the B-Cell activating factor BAFF. The advantages of RNA-Seq over
Microarray became clear during the comparative analysis in the first publication (see 3.1).
The primary focus was the performance evaluation of bioinformatical methods based on the
given data sets. The work
ow performance was evaluated during the alignment, transcript
quantification, differential gene expression analysis, and functional profiling steps. Results
showed, that despite the work
ow with TopHat2 and Cuffinks, all work
ows achieved nearly
equally good results with a slight preference for STAR and RSEM, as STAR achieved the
overall highest mapping rate and RSEM incorporated multi-mapped reads for quantification
and was also capable of quantifying transcript isoforms next to genes. Afterwards, the best
performing work
ow pipeline was applied to mice in another study (see 3.3). The mice
developed metastases in the liver from colorectal cancer. The bioinformatical approach
streamlined via the work
ow helped a lot in interpreting the biology behind the expression
of metastasis enhancing genes. It was possible to show links of metastasis-related genes
and their stimulation via the liver environment. These genes were associated with tissue remodelling, cell proliferation, adhesion, wnt activity, transcription/regulation, and inhibition
of apoptosis.
The question if a reliable identification of somatic mutation is possible in RNA-Seq is tackled
by implementing Wileup, a program is written in Perl. Wileup's performance was evaluated
against the state-of-the-art somatic variant caller Mutect2 from the GATK tool suite for
matched RNA-Seq and Exome-Seq samples of 14 patients with either brain (seven patients)
or liver (seven patients) metastases (see 3.2). Results showed that Wileup was capable of
finding all somatic mutations in RNA-Seq identified by Mutect2 in Exome-Seq. In contrast,
Mutect2 and Wileup identified unique germline mutation only found in either of the methods.
These could be explained due to a lack of expression on the RNA-Seq data or due to too
high duplication level in the Exome-Seq data. Furthermore, the somatic mutations could be
independently validated by pathological annotation data. For the uniquely found germline
mutations of either method, it was possible to verify all of them, as they were re-identified in
the Exome-sequenced blood samples of the corresponding patients.
In conclusion, the presented studies in this thesis contribute towards establishing pipeline
standards in transcriptomics, with the focus on differential expression analysis (DEA), and
exploring the capabilities of mutation calling in RNA-Seq. | de |