Single-molecule sequencing of the desiccation--tolerant grass Oropetium thomaeum

Citation metadata

From: Nature(Vol. 527, Issue 7579)
Publisher: Nature Publishing Group
Document Type: Report
Length: 6,640 words
Lexile Measure: 1470L

Document controls

Main content

Article Preview :

Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly (1). The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE) (2). Here we report the whole-genome sequencing and assembly of the desiccation-tolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16 kilobases) reads with random errors, we assembled 99% (244 megabases) of the Oropetium genome into 625 contigs with an N50 length of 2.4 megabases. Oropetium is an example of a 'near-complete' draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, transposable elements and rRNA clusters that are typically unassembled in draft genomes. Oropetium has 28,466 protein-coding genes and 43% repeat sequences, yet with 30% more compact euchromatic regions it is the smallest known grass genome. The Oropetium genome demonstrates the utility of single-molecule real-time sequencing for assembling high-quality plant and other eukaryotic genomes, and serves as a valuable resource for the plant comparative genomics community.

The genomes of Arabidopsis (3), rice (4), poplar, grape and Sorghum (5) were first sequenced using high-quality and reiterative Sanger-based approaches producing a series of 'gold standard' reference genomes. The advent of next-generation sequencing (NGS) technologies reduced costs of sequencing substantially, which has enabled sequencing of over 100 plant genomes (1). The quality of plant genome assemblies depends on genome size, ploidy, heterozygosity and sequence coverage, but most NGS-based genomes have on the order of tens of thousands of short contigs distributed in thousands of scaffolds. The short read lengths of NGS, inherent biases and non-random sequencing errors have resulted in highly fragmented draft genome assemblies that are not complete, which means they are missing biologically meaningful sequences including entire genes, regulatory regions, transposable elements, centromeres, telomeres and haplotype-specific structural variations. It is becoming clear from ENCODE projects that complete genomes are needed to better understand the importance of the non-coding regions of genomes (2).

More than 40% of calories consumed by humans are derived from grasses, and the grass family (Poaceae) is arguably the most important plant family with regard to global food security (6). The size and complexity of most grass genomes has challenged progress in gene discovery and comparative genomics, although draft genomes are now available for most agriculturally important grasses (1). The largest genome assemblies, such as maize (2,300 megabases (Mb)) (7), barley (5,100 Mb) (8) and wheat (hexaploid, 17,000 Mb) (9) are highly fragmented as a result of the inability of current sequencing technologies to span complex repeat regions. Near-finished reference genomes are available for rice4, Sorghum (5) and Brachypodium (10), but more high-quality grass genomes are needed for comparative genomics and gene discovery. Here we present the 'near-complete' draft genome of the grass Oropetium thomaeum, the first high-quality reference genome from...

Get Full Access
Gale offers a variety of resources for education, lifelong learning, and academic research. Log in through your library to get access to full content and features!
Access through your library

Source Citation

Source Citation   

Gale Document Number: GALE|A436231743