A universal probe set for targeted sequencing of 353 nuclear genes from any flowering plant designed using k-medoids clustering

2.50
Hdl Handle:
http://hdl.handle.net/10547/623144
Title:
A universal probe set for targeted sequencing of 353 nuclear genes from any flowering plant designed using k-medoids clustering
Authors:
Johnson, Matthew G. ( 0000-0002-1958-6334 ) ; Pokorny, Lisa ( 0000-0002-2478-8555 ) ; Dodsworth, Steven ( 0000-0001-6531-3540 ) ; Botigue, Laura R. ( 0000-0001-7114-5168 ) ; Cowan, Robyn S.; Devault, Alison; Eiserhardt, Wolf L. ( 0000-0002-8136-5233 ) ; Epitawalage, Niroshini; Forest, Felix ( 0000-0002-2004-433X ) ; Kim, Jan T.; Leebens-Mack, James H. ( 0000-0003-4811-2231 ) ; Leitch, Ilia J. ( 0000-0002-3837-8186 ) ; Maurin, Olivier; Soltis, Douglas E.; Soltis, Pamela S.; Wong, Gane Ka-Shu ( 0000-0001-6108-5560 ) ; Wickett, Norman J. ( 0000-0003-0944-1956 ) ; Baker, William J. ( 0000-0001-6727-1831 )
Abstract:
Sequencing of target-enriched libraries is an efficient and cost-effective method for obtaining DNA sequence data from hundreds of nuclear loci for phylogeny reconstruction. Much of the cost of developing targeted sequencing approaches is associated with the generation of preliminary data needed for the identification of orthologous loci for probe design. In plants, identifying orthologous loci has proven difficult due to a large number of whole-genome duplication events, especially in the angiosperms (flowering plants).We used multiple sequence alignments from over 600 angiosperms for 353 putatively single-copy protein-coding genes identified by the One Thousand Plant Transcriptomes Initiative to design a set of targeted sequencing probes for phylogenetic studies of any angiosperm group. To maximize the phylogenetic potential of the probes, while minimizing the cost of production, we introduce a k-medoids clustering approach to identify the minimum number of sequences necessary to represent each coding sequence in the final probe set. Using this method, 5–15 representative sequences were selected per orthologous locus, representing the sequence diversity of angiosperms more efficiently than if probes were designed using available sequenced genomes alone. To test our approximately 80,000 probes, we hybridized libraries from 42 species spanning all higher-order groups of angiosperms, with a focus on taxa not present in the sequence alignments used to design the probes. Out of a possible 353 coding sequences, we recovered an average of 283 per species and at least 100 in all species. Differences among taxa in sequence recovery could not be explained by relatedness to the representative taxa selected for probe design, suggesting that there is no phylogenetic bias in the probe set. Our probe set, which targeted 260 kbp of coding sequence, achieved a median recovery of 137 kbp per taxon in coding regions, a maximum recovery of 250 kbp, and an additional median of 212 kbp per taxon in flanking non-coding regions across all species. These results suggest that the Angiosperms353 probe set described here is effective for any group of flowering plants and would be useful for phylogenetic studies from the species level to higher-order groups, including the entire angiosperm clade itself.
Citation:
Johnson MG, Pokorny L, Dodsworth S, Botigué LR, Cowan RS, Devault A, Eiserhardt WL, Epitawalage N, Forest F, Kim JT, Leebens-Mack JH, Leitch IJ, Maurin O, Soltis DE, Soltis PS, Ka-Shu Wong G Baker WJ, Wickett NJ (2019) 'A universal probe set for targeted sequencing of 353 nuclear genes from any flowering plant designed using k-medoids clustering', Systematic Biology, (), pp.-.
Publisher:
Oxford University Press (OUP)
Journal:
Systematic Biology
Issue Date:
11-Feb-2019
URI:
http://hdl.handle.net/10547/623144
DOI:
10.1093/sysbio/syy086
PubMed ID:
30535394
Additional Links:
https://academic.oup.com/sysbio/advance-article/doi/10.1093/sysbio/syy086/5237557
Type:
Article
Language:
en
ISSN:
1063-5157
Sponsors:
This work was supported by the Texas Tech University College of Arts and Sciences [to M.G.J.], the National Science Foundation [DEB-1239992, DEB-1342873 to N.J.W.], and by the Calleva Foundation, the Garfield Weston Foundation, and the Sackler Trust to the Royal Botanic Gardens, Kew
Appears in Collections:
Biomedical and biological science

Full metadata record

DC FieldValue Language
dc.contributor.authorJohnson, Matthew G.en
dc.contributor.authorPokorny, Lisaen
dc.contributor.authorDodsworth, Stevenen
dc.contributor.authorBotigue, Laura R.en
dc.contributor.authorCowan, Robyn S.en
dc.contributor.authorDevault, Alisonen
dc.contributor.authorEiserhardt, Wolf L.en
dc.contributor.authorEpitawalage, Niroshinien
dc.contributor.authorForest, Felixen
dc.contributor.authorKim, Jan T.en
dc.contributor.authorLeebens-Mack, James H.en
dc.contributor.authorLeitch, Ilia J.en
dc.contributor.authorMaurin, Olivieren
dc.contributor.authorSoltis, Douglas E.en
dc.contributor.authorSoltis, Pamela S.en
dc.contributor.authorWong, Gane Ka-Shuen
dc.contributor.authorWickett, Norman J.en
dc.contributor.authorBaker, William J.en
dc.date.accessioned2019-02-11T14:30:37Z-
dc.date.available2019-02-11T14:30:37Z-
dc.date.issued2019-02-11-
dc.identifier.citationJohnson MG, Pokorny L, Dodsworth S, Botigué LR, Cowan RS, Devault A, Eiserhardt WL, Epitawalage N, Forest F, Kim JT, Leebens-Mack JH, Leitch IJ, Maurin O, Soltis DE, Soltis PS, Ka-Shu Wong G Baker WJ, Wickett NJ (2019) 'A universal probe set for targeted sequencing of 353 nuclear genes from any flowering plant designed using k-medoids clustering', Systematic Biology, (), pp.-.en
dc.identifier.issn1063-5157-
dc.identifier.pmid30535394-
dc.identifier.doi10.1093/sysbio/syy086-
dc.identifier.urihttp://hdl.handle.net/10547/623144-
dc.description.abstractSequencing of target-enriched libraries is an efficient and cost-effective method for obtaining DNA sequence data from hundreds of nuclear loci for phylogeny reconstruction. Much of the cost of developing targeted sequencing approaches is associated with the generation of preliminary data needed for the identification of orthologous loci for probe design. In plants, identifying orthologous loci has proven difficult due to a large number of whole-genome duplication events, especially in the angiosperms (flowering plants).We used multiple sequence alignments from over 600 angiosperms for 353 putatively single-copy protein-coding genes identified by the One Thousand Plant Transcriptomes Initiative to design a set of targeted sequencing probes for phylogenetic studies of any angiosperm group. To maximize the phylogenetic potential of the probes, while minimizing the cost of production, we introduce a k-medoids clustering approach to identify the minimum number of sequences necessary to represent each coding sequence in the final probe set. Using this method, 5–15 representative sequences were selected per orthologous locus, representing the sequence diversity of angiosperms more efficiently than if probes were designed using available sequenced genomes alone. To test our approximately 80,000 probes, we hybridized libraries from 42 species spanning all higher-order groups of angiosperms, with a focus on taxa not present in the sequence alignments used to design the probes. Out of a possible 353 coding sequences, we recovered an average of 283 per species and at least 100 in all species. Differences among taxa in sequence recovery could not be explained by relatedness to the representative taxa selected for probe design, suggesting that there is no phylogenetic bias in the probe set. Our probe set, which targeted 260 kbp of coding sequence, achieved a median recovery of 137 kbp per taxon in coding regions, a maximum recovery of 250 kbp, and an additional median of 212 kbp per taxon in flanking non-coding regions across all species. These results suggest that the Angiosperms353 probe set described here is effective for any group of flowering plants and would be useful for phylogenetic studies from the species level to higher-order groups, including the entire angiosperm clade itself.en
dc.description.sponsorshipThis work was supported by the Texas Tech University College of Arts and Sciences [to M.G.J.], the National Science Foundation [DEB-1239992, DEB-1342873 to N.J.W.], and by the Calleva Foundation, the Garfield Weston Foundation, and the Sackler Trust to the Royal Botanic Gardens, Kewen
dc.language.isoenen
dc.publisherOxford University Press (OUP)en
dc.relation.urlhttps://academic.oup.com/sysbio/advance-article/doi/10.1093/sysbio/syy086/5237557en
dc.rightsYellow - can archive pre-print (ie pre-refereeing)-
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectgenomicsen
dc.subjectphylogeneticsen
dc.subjectC400 Geneticsen
dc.titleA universal probe set for targeted sequencing of 353 nuclear genes from any flowering plant designed using k-medoids clusteringen
dc.typeArticleen
dc.identifier.journalSystematic Biologyen
dc.date.updated2019-02-11T12:04:52Z-
dc.description.noteopen access article with Creative Commons CC BY-

Related articles on PubMed

This item is licensed under a Creative Commons License
Creative Commons
All Items in UOBREP are protected by copyright, with all rights reserved, unless otherwise indicated.