Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Oct;574(7780):679-685.
doi: 10.1038/s41586-019-1693-2. Epub 2019 Oct 23.

One thousand plant transcriptomes and the phylogenomics of green plants

Collaborators

One thousand plant transcriptomes and the phylogenomics of green plants

One Thousand Plant Transcriptomes Initiative. Nature. 2019 Oct.

Abstract

Green plants (Viridiplantae) include around 450,000-500,000 species1,2 of great diversity and have important roles in terrestrial and aquatic ecosystems. Here, as part of the One Thousand Plant Transcriptomes Initiative, we sequenced the vegetative transcriptomes of 1,124 species that span the diversity of plants in a broad sense (Archaeplastida), including green plants (Viridiplantae), glaucophytes (Glaucophyta) and red algae (Rhodophyta). Our analysis provides a robust phylogenomic framework for examining the evolution of green plants. Most inferred species relationships are well supported across multiple species tree and supermatrix analyses, but discordance among plastid and nuclear gene trees at a few important nodes highlights the complexity of plant genome evolution, including polyploidy, periods of rapid speciation, and extinction. Incomplete sorting of ancestral variation, polyploidization and massive expansions of gene families punctuate the evolutionary history of green plants. Notably, we find that large expansions of gene families preceded the origins of green plants, land plants and vascular plants, whereas whole-genome duplications are inferred to have occurred repeatedly throughout the evolution of flowering plants and ferns. The increasing availability of high-quality plant genome sequences and advances in functional genomics are enabling research on genome evolution across the green tree of life.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Diversity within the Viridiplantae.
ae, Green algae. a, Acetabularia sp. (Ulvophyceae). b, Stephanosphaera pluvialis (Chlorophyceae). c, Botryococcus sp. (Trebouxiophyceae). d, Chara sp. (Charophyceae). e, ‘Spirotaenia’ sp. (taxonomy under review) (Zygnematophyceae). fp, Land plants. f, Notothylas orbicularis (Anthocerotophyta (hornwort)). g, Conocephalum conicum (Marchantiophyta (thalloid liverwort)). h, Sphagnum sp. (Bryophyta (moss)). i, Dendrolycopodium obscurum (Lycopodiophyta (club moss)). j, Equisetum telmateia (Polypodiopsida, Equisetidae (horsetail)). k, Parablechnum schiedeanum (Polypodiopsida, Polypodiidae (leptosporangiate fern)). l, Ginkgo biloba (Ginkgophyta). m, Pseudotsuga menziesii (Pinophyta (conifer)). n, Welwitschia mirabilis (Gnetophyta). o, Bulnesia arborea (Angiospermae, eudicot, rosid). p, Paphiopedilum lowii (Angiospermae, monocot, orchid). a, Photograph reproduced with permission of Thieme Verlag, Stuttgart. be, Photographs courtesy of M. Melkonian. fj, ln, p, Photographs courtesy of D.W.S. k, Photograph courtesy of R. Moran. o, Photograph courtesy of W. Judd.
Fig. 2
Fig. 2. Phylogenetic inferences of major clades.
Phylogenetic inferences were based on ASTRAL analysis of 410 single-copy nuclear gene families extracted from genome and transcriptome data from 1,153 species, including 1,090 green plant (Viridiplantae) species (Supplementary Table 1). a, Phylogram showing internal branch lengths proportional to coalescent units (2Ne generations) between branching events, as estimated by ASTRAL-II v.5.0.3. b, Relationships among major clades with red box outlining flowering plant clade. Species numbers are shown for each lineage. Most inferred relationships were robust across data types and analyses (Supplementary Figs. 1–3) with some exceptions (Supplementary Fig. 6). Data and analysis scripts are available at 10.5281/zenodo.3255100.
Fig. 3
Fig. 3. Alternative branching orders for contentious relationships.
Local posterior probabilities (shown only when below 1.0) and gene-tree quartet frequencies (bar graphs) for alternative branching orders for contentious relationships in the plant phylogeny (see text). a, Early Archaeplastida diversification. b, Early embryophyte diversification. c, Gymnosperms. d, Early angiosperm diversification. e, Early Viridiplantae diversification. f, Early fern diversification. g, The sister lineage to land plants. h, Trebouxiophyceae, Ulvophyceae and Chlorophyceae. i, Eudicot diversification. Red bars represent the ASTRAL topology; blue and yellow trees and bars represent the frequencies of alternative branching orders in ASTRAL. The topologies recovered in the concatenated supermatrix analysis and plastid gene analyses are also indicated. Dashed horizontal lines mark expectation for a hard polytomy (purple). In gi, panels include more than 4 tips, so nodes are delineated with Roman numerals and bar graphs are shown for each node and asterisks above branches indicate failure to reject the hypothesis that the node is a polytomy. Data and analysis scripts are available at 10.5281/zenodo.3255100.
Fig. 4
Fig. 4. The distribution of inferred ancient WGDs across lineages of green plants.
a, The locations of estimated WGDs are labelled red in the phylogeny of all 1000 Plants (1KP) samples. b, The number of inferred ancient polyploidization events within each lineage is shown in the violin plots. The white dot indicates the median, the thick black bars represent the interquartile range, the thin black lines define the 95% confidence interval and the grey shading represents the density of data points. The sample sizes for each lineage are shown within parentheses along with taxon names on the phylogeny. The phylogenetic placement of inferred WGDs is illustrated in Supplementary Fig. 8 and data supporting each WGD inference are provided in Supplementary Table 2.
Fig. 5
Fig. 5. Assessment of significant expansions and contractions of largest plant gene families.
a, Weighted average gene-family size for species groups (normalized to account for differences in gene-family sizes, weight = 1/(maximum observed gene-family size)). The ANA grade comprises Amborellales, Nymphaeales and Austrobaileyales, successive sister lineages to a clade with the remaining extant angiosperms; the ‘CRPT+B’ grade includes Ceratophyllales, Ranunculales, Proteales lineages and a Trochodendrales + Buxales clade in the ASTRAL tree (Fig. 2). Sample sizes are proportional to bar widths (from left to right, n = 23 (Chromista), 18 (Rhodophyta), 2 (Glaucophyta), 94 (Chlorophyta), 42 (streptophyte algae), 7 (hornworts), 18 (liverworts), 38 (mosses), 16 (lycophytes), 59 (ferns; monilophytes), 76(gymnosperms), 6 (ANA grade), 96 (monocots), 1 (*representing Chloranthales), 22 (magnoliids), 29 (CRPT+B grade), 205 (asterids), 48 (caryophyllids), 176 (rosids), 23 (Saxifragales) and 6 (Santalales). b, Gene families exhibiting significant copy number changes (two-sided Kolmogorov–Smirnov test; P < 1 × 10−6; gene-family expansions represent a gain of more than 50% and contractions represent a loss of more than 33%) with colour codes showing the magnitude of the observed fold changes. Data and analysis scripts are available at https://github.com/GrosseLab/OneKP-gene-family-evo.
Extended Data Fig. 1
Extended Data Fig. 1. Mean number of MADS-box genes in the transcriptomes of different plant clades.
Type I genes are shown in green; type II genes are shown in purple and orange. Transcripts in which only a K-box was identified (which are probably partial transcripts of type II genes) are shown in orange. Data are mean ± s.d. Dots indicate the numbers of MADS-box genes in individual transcriptomes. Sample sizes (n) are as follows: liverworts, n = 26; hornworts, n = 7; mosses, n = 37; lycophytes, n = 22; eusporangiate ferns, n = 10; leptosporangiate ferns, n = 62; gymnosperms, n = 84; and angiosperms, n = 820. A total of 1,068 transcriptomes were analysed for this figure.
Extended Data Fig. 2
Extended Data Fig. 2. RAxML phylogeny of classic type II MIKCc MADS-box genes of liverworts, mosses, lycophytes, monilophytes (ferns) and spermatophytes (seed plants).
CgMADS1 from Chara globularis was used as a representative of the outgroup. Branches leading to genes from the different phyla are coloured according to the simplified phylogeny of land plants that is shown in the top left corner. The phylogenetic position of some known type II MIKCc MADS-box genes representative of previously described clades of MADS-box genes are indicated on the right together with the species and phylum in which these genes have been identified. The four clades of MIKCc MADS-box genes that trace back to the most recent common ancestor of Euphyllophytes are shaded in grey.
Extended Data Fig. 3
Extended Data Fig. 3. Assessments of transcriptome assembly gene-family representation relative to gene-family members identified in annotated genomes.
a, BUSCO versus CEGMA (CEG) gene occupancy for each sample. BUSCO transcriptome completeness is given as ‘complete plus fragmented’ BUSCO percentage using the eukaryota_odb9 database. CEGMA transcriptome completeness is given as conditional reciprocal best BLAST hits (see Supplementary Methods). Dotted line represents 57.5% (BUSCO) and 70% (CEGMA) gene occupancy threshold. Black dots represent 1KP samples (n = 1,020) and blue dots annotated plant genomes (n = 30). b, BUSCO gene occupancy for each major clade. Boxes represent lower and upper quartiles; the black bold line represents the median and whiskers extend to the most-extreme data points. Sample sizes: Chromista, n = 23; Rhodophyta, n = 18; Glaucophyta, n = 2; Chlorophyta, n = 94; streptophyte algae, n = 42; hornworts, n = 7; liverworts, n = 18; mosses, n = 38; lycophytes, n = 16; monilophytes, n = 59; gymnosperms, n = 76; ANA grade, n = 6; monocots, n = 96; Chloranthales, n = 1; magnoliids, n = 22; CRPT grade, n = 29; asterids, n = 205; Caryophyllales, n = 48; rosids, n = 176; Saxifragales, n = 23; Santalales, n = 6. Dotted line represents 57.5% (BUSCO) gene occupancy threshold. c, Scatterplot of gene-family sizes in transcriptomes versus genomes on a logarithmic scale. The grey line indicates x = y, the black line indicates a linear regression fitted to the data (n = 299; 23 gene families in 13 species groups). Pearson and Spearman correlation coefficients (n = 299) are indicated. d, Box plot of transcriptome:genome ratios of gene-family sizes for each species group. Boxes indicate upper and lower quartiles with median; whiskers extend to data points no more than 1.5× the interquartile range (n = 23) with outliers plotted as individual data points. e, f, Number of remaining sequences after filtering with cd-hit and a threshold of 100%, 99.9%, 99%, 95% or 90% in transcriptome sequences and reference genomes (Supplementary Table 8). Boxes indicate upper and lower quartiles with median; whiskers extend to data points no more than 1.5× the interquartile range (e, n = 1,451; f, n = 32) with outliers plotted as individual data points.

Similar articles

Cited by

References

    1. Corlett RT. Plant diversity in a changing world: status, trends, and conservation needs. Plant Divers. 2016;38:10–16. - PMC - PubMed
    1. Lughadha EN, et al. Counting counts: revised estimates of numbers of accepted species of flowering plants, seed plants, vascular plants and land plants with a review of other recent estimates. Phytotaxa. 2016;272:82–88.
    1. Kumar S, Stecher G, Suleski M, Hedges SB. TimeTree: a resource for timelines, timetrees, and divergence times. Mol. Biol. Evol. 2017;34:1812–1819. - PubMed
    1. Schery, R. W. Plants for Man 2nd edn (Prentice-Hall, 1972).
    1. Philippe H, Delsuc F, Brinkmann H, Lartillot N. Phylogenomics. Annu. Rev. Ecol. Evol. Syst. 2005;36:541–562.

Publication types