• Combination of Sanger and target-enrichment markers supports revised generic delimitation in the problematic ‘Urera clade’ of the nettle family (Urticaceae)

      Wells, Tom; Maurin, Olivier; Dodsworth, Steven; Friis, Ib; Cowan, Robyn S.; Epitawalage, Niroshini; Brewer, Grace E.; Forest, Felix; Baker, William; Monro, Alexandre; et al. (Elsevier, 2020-11-05)
      Urera Gaudich, s.l. is a pantropical genus comprising c. 35 species of trees, shrubs, and vines. It has a long history of taxonomic uncertainty, and is repeatedly recovered as polyphyletic within a poorly resolved complex of genera in the Urticeae tribe of the nettle family (Urticaceae). To provide generic delimitations concordant with evolutionary history, we use increased taxonomic and genomic sampling to investigate phylogenetic relationships among Urera and associated genera. A cost-effective two-tier genome-sampling approach provides good phylogenetic resolution by using (i) a taxon-dense sample of Sanger sequence data from two barcoding regions to recover clades of putative generic rank, and (ii) a genome-dense sample of target-enrichment data for a subset of representative species from each well-supported clade to resolve relationships among them. The results confirm the polyphyly of Urera s.l. with respect to the morphologically distinct genera Obetia, Poikilospermum and Touchardia. Afrotropic members of Urera s.l. are recovered in a clade sister to the xerophytic African shrubs Obetia; and Hawaiian ones with Touchardia, also from Hawaii. Combined with distinctive morphological differences between Neotropical and African members of Urera s.l., these results lead us to resurrect the previously synonymised name Scepocarpus Wedd. for the latter. The new species epiphet Touchardia oahuensis T.Wells & A.K. Monro is offered as a replacement name for Touchardia glabra non H.St.John, and subgenera are created within Urera s.s. to account for the two morphologically distinct Neotropical clades. This new classification minimises taxonomic and nomenclatural disruption, while more accurately reflecting evolutionary relationships within the group.
    • A comprehensive phylogenomic platform for exploring the angiosperm tree of life

      Baker, William J.; Bailey, Paul; Barber, Vanessa; Barker, Abigail; Bellot, Sidonie; Bishop, David; Botigué, Laura R.; Brewer, Grace E.; Carruthers, Tom; Clarkson, James J.; et al. (Oxford University Press, 2021-05-13)
      The tree of life is the fundamental biological roadmap for navigating the evolution and properties of life on Earth, and yet remains largely unknown. Even angiosperms (flowering plants) are fraught with data gaps, despite their critical role in sustaining terrestrial life. Today, high-throughput sequencing promises to significantly deepen our understanding of evolutionary relationships. Here, we describe a comprehensive phylogenomic platform for exploring the angiosperm tree of life, comprising a set of open tools and data based on the 353 nuclear genes targeted by the universal Angiosperms353 sequence capture probes. The primary goals of this paper are to (i) document our methods, (ii) describe our first data release and (iii) present a novel open data portal, the Kew Tree of Life Explorer (https://treeoflife.kew.org ). We aim to generate novel target sequence capture data for all genera of flowering plants, exploiting natural history collections such as herbarium specimens, and augment it with mined public data. Our first data release, described here, is the most extensive nuclear phylogenomic dataset for angiosperms to date, comprising 3,099 samples validated by DNA barcode and phylogenetic tests, representing all 64 orders, 404 families (96%) and 2,333 genera (17%). A "first pass" angiosperm tree of life was inferred from the data, which totalled 824,878 sequences, 489,086,049 base pairs, and 532,260 alignment columns, for interactive presentation in the Kew Tree of Life Explorer. This species tree was generated using methods that were rigorous, yet tractable at our scale of operation. Despite limitations pertaining to taxon and gene sampling, gene recovery, models of sequence evolution and paralogy, the tree strongly supports existing taxonomy, while challenging numerous hypothesized relationships among orders and placing many genera for the first time. The validated dataset, species tree and all intermediates are openly accessible via the Kew Tree of Life Explorer and will be updated as further data become available. This major milestone towards a complete tree of life for all flowering plant species opens doors to a highly integrated future for angiosperm phylogenomics through the systematic sequencing of standardised nuclear markers. Our approach has the potential to serve as a much-needed bridge between the growing movement to sequence the genomes of all life on Earth and the vast phylogenomic potential of the world's natural history collections.
    • Factors affecting targeted sequencing of 353 nuclear genes from herbarium specimens spanning the diversity of angiosperms

      Brewer, Grace E.; Clarkson, James J.; Maurin, Olivier; Zuntini, Alexandre R.; Barber, Vanessa; Bellot, Sidonie; Biggs, Nicola; Cowan, Robyn S.; Davies, Nina M.; Dodsworth, Steven; et al. (Frontiers, 2019-09-12)
      The world’s herbaria collectively house millions of diverse plant specimens, including endangered or extinct species and type specimens. Unlocking genetic data from the typically highly degraded DNA obtained from herbarium specimens was difficult until the arrival of high-throughput sequencing approaches, which can be applied to low quantities of severely fragmented DNA. Target enrichment involves using short molecular probes that hybridise and capture genomic regions of interest for high-throughput sequencing. In this study on herbariomics, we used this targeted sequencing approach and the Angiosperms353 universal probe set to recover up to 351 nuclear genes from 435 herbarium specimens that are up to 204 years old and span the breadth of angiosperm diversity. We show that on average 207 genes were successfully retrieved from herbarium specimens, although the mean number of genes retrieved and target enrichment efficiency is significantly higher for silica gel-dried specimens. Forty-seven target nuclear genes were recovered from a herbarium specimen of the critically endangered St Helena boxwood, Mellissia begoniifolia, collected in 1815. Herbarium specimens yield significantly less high molecular weight DNA than silica gel-dried specimens, and genomic DNA quality declines with sample age which is negatively correlated with target enrichment efficiency. Climate, taxon-specific traits, and collection strategies additionally impact target sequence recovery. We also detected taxonomic bias in targeted sequencing outcomes for the 10 most numerous angiosperm families that were investigated in depth. We recommend that 1) for species distributed in wet tropical climates, silica gel-dried specimens should be used preferentially, 2) for species distributed in seasonally dry tropical climates, herbarium and silica gel-dried specimens yield similar results, and either collection can be used, 3) taxon specific traits should be explored and established for effective optimisation of taxon-specific studies using herbarium specimens, 4) all herbarium sheets should, in future, be annotated with details of the preservation method used, 5) long-term storage of herbarium specimens should be in stable low humidity and low temperature environments, and 6) targeted sequencing with universal probes, such as Angiosperms353 should be investigated closely as a new approach for DNA barcoding that will ensure better exploitation of herbarium specimens than traditional Sanger sequencing approaches.
    • Genome size diversity in angiosperms and its influence on gene space

      Dodsworth, Steven; Leitch, Andrew R.; Leitch, Ilia J.; Queen Mary University of London; Royal Botanic Gardens, Kew (Elsevier Ltd, 2015-11-21)
      Genome size varies c. 2400-fold in angiosperms (flowering plants), although the range of genome size is skewed towards small genomes, with a mean genome size of 1C = 5.7 Gb. One of the most crucial factors governing genome size in angiosperms is the relative amount and activity of repetitive elements. Recently, there have been new insights into how these repeats, previously discarded as ‘junk’ DNA, can have a significant impact on gene space (i.e. the part of the genome comprising all the genes and gene-related DNA). Here we review these new findings and explore in what ways genome size itself plays a role in influencing how repeats impact genome dynamics and gene space, including gene expression.
    • A nuclear phylogenomic study of the angiosperm order Myrtales, exploring the potential and limitations of the universal Angiosperms353 probe set

      Maurin, Olivier; Anest, Artemis; Bellot, Sidonie; Biffin, Edward; Brewer, Grace E.; Charles-Dominique, Tristan; Cowan, Robyn S.; Dodsworth, Steven; Epitawalage, Niroshini; Gallego, Berta; et al. (Wiley, 2021-07-31)
      To further advance the understanding of the species-rich, economically and ecologically important angiosperm order Myrtales in the rosid clade, comprising nine families, approximately 400 genera and almost 14,000 species occurring on all continents (except Antarctica), we tested the Angiosperms353 probe kit. We combined high-throughput sequencing and target enrichment with the Angiosperms353 probe kit to evaluate a sample of 485 species across 305 genera (76% of all genera in the order). Results provide the most comprehensive phylogenetic hypothesis for the order to date. Relationships at all ranks, such as the relationship of the early-diverging families, often reflect previous studies, but gene conflict is evident, and relationships previously found to be uncertain often remain so. Technical considerations for processing HTS data are also discussed. High-throughput sequencing and the Angiosperms353 probe kit are powerful tools for phylogenomic analysis, but better understanding of the genetic data available is required to identify genes and gene trees that account for likely incomplete lineage sorting and/or hybridization events.
    • Potential of herbariomics for studying repetitive DNA in angiosperms

      Dodsworth, Steven; Guignard, Maite S.; Christenhusz, Maarten J.M.; Cowan, Robyn S.; Knapp, Sandra; Maurin, Olivier; Struebig, Monika; Leitch, Andrew R.; Chase, Mark W.; Forest, Felix; et al. (Frontiers Media, 2018-10-29)
      Repetitive DNA has an important role in angiosperm genomes and is relevant to our understanding of genome size variation, polyploidisation and genome dynamics more broadly. Much recent work has harnessed the power of high-throughput sequencing (HTS) technologies to advance the study of repetitive DNA in flowering plants. Herbarium collections provide a useful historical perspective on genome diversity through time, but their value for the study of repetitive DNA has not yet been explored. We propose that herbarium DNA may prove as useful for studies of repetitive DNA content as it has for reconstructed organellar genomes and low-copy nuclear sequence data. Here we present a case study in the tobacco genus (Nicotiana; Solanaceae), showing that herbarium specimens can provide accurate estimates of the repetitive content of angiosperm genomes by direct comparison with recently-collected material. We show a strong correlation between the abundance of repeat clusters, e.g., different types of transposable elements and satellite DNA, in herbarium collections versus recent material for four sets of Nicotiana taxa. These results suggest that herbarium specimen genome sequencing (herbariomics) holds promise for both repeat discovery and analyses that aim to investigate the role of repetitive DNAs in genomic evolution, particularly genome size evolution and/or contributions of repeats to the regulation of gene space.
    • Resolving species boundaries in a recent radiation with the Angiosperms353 probe set: the Lomatium packardiae/L. anomalum clade of the L. triternatum (Apiaceae) complex

      Ottenlips, Michael V.; Mansfield, Donald H.; Buerki, Sven; Feist, Mary Ann E.; Downie, Stephen R.; Dodsworth, Steven; Forest, Felix; Plunkett, Gregory M.; Smith, James F.; Boise State University; et al. (Wiley, 2021-06-08)
      Speciation not associated with morphological shifts is challenging to detect unless molecular data are employed. Using Sanger-sequencing approaches, the Lomatium packardiae/L. anomalum subcomplex within the larger Lomatium triternatum complex could not be resolved. Therefore, we attempt to resolve these boundaries here. The Angiosperms353 probe set was employed to resolve the ambiguity within Lomatium triternatum species complex using 48 accessions assigned to L. packardiae, L. anomalum, or L. triternatum. In addition to exon data, 54 nuclear introns were extracted and were complete for all samples. Three approaches were used to estimate evolutionary relationships and define species boundaries: STACEY, a Bayesian coalescent-based species tree analysis that takes incomplete lineage sorting into account; ASTRAL-III, another coalescent-based species tree analysis; and a concatenated approach using MrBayes. Climatic factors, morphological characters, and soil variables were measured and analyzed to provide additional support for recovered groups. The STACEY analysis recovered three major clades and seven subclades, all of which are geographically structured, and some correspond to previously named taxa. No other analysis had full agreement between recovered clades and other parameters. Climatic niche and leaflet width and length provide some predictive ability for the major clades. The results suggest that these groups are in the process of incipient speciation and incomplete lineage sorting has been a major barrier to resolving boundaries within this lineage previously. These results are hypothesized through sequencing of multiple loci and analyzing data using coalescent-based processes.