|Gene duplications allow for protein functional diversification and accelerate genome evolution. Occasionally, the transposon amplification machinery reverse-transcribes mRNA of a gene, integrates it into the genome and forms an RNA-duplicated gene copy, the retrogene. Although retrogenes have been found in plants, their biology, evolution and epigenetic regulation are poorly understood. We developed a novel bioinformatic retrogene annotation tool (RAT) to screen Arabidopsis genomes for retrogenes. We identified 251 (216 novel) and 168 retrogenes in Arabidopsis thaliana and Arabidopsis lyrata, corresponding to 1% and 0.5% of protein coding genes respectively. Based on our findings, we calculated emergence rate of five to ten retrogenes per million years, which is at least ten times faster than previously estimated. Most of retrogenes were randomly integrated away from their parental gene loci; however, some showed targeted integration replacing their parental genes. Therefore, we developed a bioinformatic targeted retrogene annotation tool (TRAT) to screen Arabidopsis genomes for these rare cases. To our knowledge, we report the first natural in planta retrogene targeting events.
Arabidopsis retrogenes are derived from ubiquitously transcribed parents and reside in gene rich chromosomal regions, depleted of transposons. Unlike transposon regulation, we found retrogenes and their parents to be targets of gene-specific regulatory 21 nt sRNAs rather than transposon-specific 24 nt sRNAs. Retrogene expression levels are relatively low, but significantly higher than that of transposable elements. Approximately 25% of retrogenes are co-transcribed with their parents, and 3% with head-to-head oriented neighbors. This suggests transcription by novel or modified promoters for at least 72% of A. thaliana retrogenes. Many retrogenes reach their transcription maximum in pollen, the tissue analogous to animal spermatocytes where up-regulation of retrogenes has previously been found. This implies an evolutionarily conserved mechanism leading to this transcription pattern of RNA-duplicated genes. During transcriptional repression, retrogenes are depleted of permissive chromatin marks without an obvious enrichment for repressive modifications. However, this pattern is common to many other pollen-transcribed genes independent of their evolutionary origin. Hence, retroposition plays role in plant genome evolution and developmental transcription pattern of retrogenes suggests analogous control of RNA-duplicated genes in plants and animals.