Zur Kurzanzeige

Integrating Omics Data into Genomic Prediction

dc.contributor.advisorSimianer, Henner Prof. Dr.
dc.contributor.authorLi, Zhengcao
dc.date.accessioned2019-07-26T08:39:07Z
dc.date.available2019-07-26T08:39:07Z
dc.date.issued2019-07-26
dc.identifier.urihttp://hdl.handle.net/21.11130/00-1735-0000-0003-C178-C
dc.identifier.urihttp://dx.doi.org/10.53846/goediss-7581
dc.language.isoengde
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject.ddc630de
dc.titleIntegrating Omics Data into Genomic Predictionde
dc.typedoctoralThesisde
dc.contributor.refereeSimianer, Henner Prof. Dr.
dc.date.examination2019-07-01
dc.description.abstractengPrediction of genetic values plays a central role in quantitative genetics and breeding. Genomic prediction making use of genome-wide single nucleotide polymorphisms (SNPs) was widely adopted to predict breeding values in animal and plant breeding, and to accurately quantify individual disease risk early in human genetics. In the multi-omics era, as omics data (genome, transcriptome, proteome, metabolome, epigenome etc.) increasingly became available during recent years, exploring multi-layer omics data to be predictors in prediction models has been an accessible way to improve predictive abilities in phenotype prediction. Gene expression profiles potentially hold valuable information for the prediction of breeding values and phenotypes. The Drosophila melanogaster Genetic Reference Panel (DGRP) is a community resource for analysis of population genomics and quantitative traits. It consists of more than 200 fully sequenced inbred lines (include 185 lines with whole genome gene expression data) derived from the Raleigh population, USA. In Chapter 2, the utility of transcriptome data for phenotype prediction was tested with 185 inbred lines of Drosophila melanogaster for 9 traits in two sexes. In total, 2,863,909 SNPs and 18,140 genome-wide annotated genes and novel transcribed regions (NTRs) were used for all the analyses. We incorporated the transcriptome data into genomic prediction via two kernel methods: GTBLUP and GRBLUP, both combining single nucleotide polymorphisms and transcriptome data. The genotypic data was used to construct the common additive genomic relationship, which was used in genomic best linear unbiased prediction (GBLUP) or jointly in a linear mixed model with a transcriptome-based linear kernel (GTBLUP), or with a transcriptome-based Gaussian kernel (GRBLUP). We studied the predictive ability of the models and discuss a concept of “omics-augmented broad sense heritability” for the multi-omics era. There was one trait (olfactory perceptions to Ethyl Butyrate in females) in which the predictive ability of GRBLUP was significantly higher (0.23) than the predictive ability of GBLUP (0.21). Nonetheless, for most traits, GRBLUP and GBLUP provided similar predictive abilities, while GRBLUP explained more of the phenotypic variance. The better goodness of fit of GRBLUP in general did not translate into a better predictive ability. A possible explanation was suggested that sample size was small and gene expression was not measured at one time point and in one specific tissue which is functionally linked to the trait of interest. It is well known that gene expression and regulation may extensively vary among different tissues. However, the transcripts abundance of Drosophila melanogaster used was quantified from the entire flies. To test whether tissue-specific transcriptome data can substantially improve predictive abilities, in Chapter 3, we used tissue-specific transcriptome data from the three mice brain tissues: hippocampus (HIP), prefrontal cortex (PFC), and striatum (STR) for phenotype prediction on four novel behavioral traits and four muscle weight traits with low to medium heritability. There were 1063 mice individuals with pedigree information from a multigenerational outbred population which had been sequenced with the reduced-representation genotyping method genotyping-by-sequencing (GBS). After quality control, 523,028 SNPs were used in the analyses. All analyses were conducted in three groups of mice with pedigree, genotype, gene expression and phenotype data, which contained 208 (HIP), 185 (PFC) and 169 (STR) individuals, respectively. The abundances of RNA products from three tissues encompassed 16,533 genes in HIP, 16,249 genes in PFC and 16,860 genes in STR. For the muscle weight traits, the tissue-specific transcriptome data-based prediction (TBLUP) showed high predictive abilities, and the predictive abilities overall were remarkably higher than the pedigree-based prediction (BLUP) and the SNP-based prediction (GBLUP). For the four behavioral traits, the increase of predictive abilities of the transcriptome data-based prediction (TBLUP) were lower than that for the muscle weight traits. When combining transcriptome data with SNPs or pedigree information as predictors, predictive abilities overall were not improved. To study whether the numbers of genes has impact on transcriptome-based prediction, we randomly chose different number of genes for the prediction with TBLUP. The differences among predictive abilities were negligible. Our results suggested that making use of transcriptome data has the potential to improve phenotype predictions if transcriptome data can be sampled in a specific tissue. In contrast to phenotype prediction, multi-omics data are not ideal candidates for prediction of genetic value and estimation of heritability, since they are not causal variants but intermediate products between causal variants and phenotypes. During the transfer process of genetic information from DNA to phenotype, multi-omics data are inevitably affected by genetic and environmental effects, and the interaction between both. The ‘pan-genome’ denotes the set of all genes or open reading frames (ORFs) present in the genomes of a group of organisms. Pan-genomic open reading frames potentially carry genome-wide protein-coding genes or causal variant information in a population. The 1002 Yeast Genome project comprised 1,011 S. cerevisiae isolates that maximized the breadth of their ecological and geographical origins. In Chapter 4, we used 787 diploid S. cerevisiae isolates with 1,625,809 high-quality reference-based SNPs, 7,796 ORFs, copy number of ORFs (CNO) and 35 traits with linear models in the genomic prediction and estimation of heritability. Our results showed that compared to SNP-based genomic prediction (GBLUP), pan-genomic ORF-based genomic prediction (OBLUP) was distinctly more accurate for all the traits, and the predictive abilities were improved by 132% on average across all traits. In addition, the ORF-based heritability can capture more additive effects than SNP-based heritability for all traits. When we combined two subsets of total SNP data (MAF ≥ 0.01 and MAF ≥ 0.05) which contained 311,447 SNPs and 102,253 SNPs, respectively, to pan-genomic ORFs with GOBLUP, the predictive abilities remained the same with OBLUP only using pan-genomic ORFs data. For the second combined method GCBLUP, the predictive abilities remained the same as with CBLUP for all traits, suggesting that ORF data or CNO data covered all causal variant information which SNP data carried. When using three different numbers of isolates in training sets in ORF-based prediction, the predictive abilities of all traits increased as the number of isolates in the training set increased, showing that increasing the training set size could more accurately estimate ORF effects. We demonstrated that pan-genomic ORFs have the potential to be a substitution of single nucleotide polymorphisms in estimation of heritability and genomic prediction under certain conditions. However, in our study there was still a big gap between traits’ heritability estimates and prediction accuracy for all the traits. We provide evidence that if larger sample sizes can be used in training set, the prediction accuracy will be further improved.de
dc.contributor.coRefereeSchmitt, Armin Prof. Dr.
dc.contributor.thirdRefereeKneib, Thomas Prof. Dr.
dc.subject.engphenotype predictionde
dc.subject.engopen reading framesde
dc.subject.engtranscriptome datade
dc.subject.engheritabilityde
dc.identifier.urnurn:nbn:de:gbv:7-21.11130/00-1735-0000-0003-C178-C-5
dc.affiliation.instituteFakultät für Agrarwissenschaftende
dc.subject.gokfullLand- und Forstwirtschaft (PPN621302791)de
dc.identifier.ppn1672306922


Dateien

Thumbnail

Das Dokument erscheint in:

Zur Kurzanzeige