PGDD is a public database to identify and catalog plant genes in terms of intragenome or cross-genome syntenic relationships. Current efforts focus on flowering plants with available whole genome sequences (preferrably assembled pseudomolecules with ordered gene models).
Data sources
| Plant genomes in this database (27 genomes) | ||||||
| Species name | Common name | Release version | Gene number | Access | Reference | |
| Arabidopsis lyrata | Lyrate rockcress | Version 1.0 (Apr 2011) |
32,670 | JGI | Nature Genetics | |
| Arabidopsis thaliana | Arabidopsis | TAIR 9.0 (Jun 2009) | 27,379 | TAIR | Nature | |
| Brachypodium distachyon | Purple false brome | Phytozome v6.0 | 32,255 | JGI | Nature | |
| Brassica rapa | Chinese cabbage | Version 1.1 |
22,285 | BRAD | Nature Genetics | |
| Cicer arietinum | Chickpea | Jan 2013 | 28,269 | LIS | Nature Biotechnology | |
| Cajanus cajan | Pigeonpea | Nov 2011 | 48,680 | IIPG | Nature Biotechnology | |
| Carica papaya | Papaya | Dec 2007 | 25,536 | Hawaii | Nature | |
| Chlamydomonas reinhardtii | Green algae | Version 4.2 |
16,036 |
JGI | Science |
|
| Cucumis sativus | Cucumber | Phytozome v6.0 |
21,491 |
JGI |
Nature Genetics | |
| Fragaria vesca | Strawberry | Dec 2010 |
34,809 |
PFR | Nature Genetics | |
| Glycine max | Soybean | Release 1 (Dec 2008) | 66,153 |
JGI | Nature | |
| Lotus japonicus | Lotus | Release 2.5 | 42,399 | Kazusa | DNA research | |
| Musa acuminata | Banana | Jul 2012 | 36,542 | CIRAD | Nature | |
| Malus x domestica | Apple | Aug 2010 |
57,386 |
IASMA | Nature Genetics | |
| Medicago truncatula | Barrel medic | Mt3.5 v3 (Jun 2011) |
45,108 |
JCVI | Nature | |
| Oryza sativa | Rice | RAP 2.0 (Nov 2007) | 30,192 | RAP | Nature | |
| Physcomitrella patens | Moss | Version 1.6 (Jan 2008) |
32,272 | JGI | Science | |
| Prunus persica* | Peach | Version 1.0 |
27,864 | JGI | - | |
| Populus trichocarpa | Western poplar | JGI 2.0 (Feb 2010) | 45,778 | JGI | Science | |
| Ricinus communis | Castor bean | Release 0.1 (May 2008) | 38,613 | JCVI | Nature Biotechnology |
|
| Sorghum bicolor | Sorghum | Sbi 1.4 (Dec 2007) | 34,496 | JGI | Nature | |
| Solanum lycopersicum | Tomato | Version 2.3 |
34,727 | SGN | Nature | |
| Selaginella moellendorffii | Selaginella | Version 1.0 (Dec 2007) |
22,273 | JGI | Science | |
| Solanum tuberosum | Potato | Version 3.4 |
39,031 | PGSC | Nature | |
| Theobroma cacao | Cacao | Release 0.9 (Sep 2010) | 28,798 |
CIRAD | Nature Genetics | |
| Vitis vinifera | Grape vine | Genoscope (Aug 2007) | 26,346 |
Genoscope | Nature | |
| Zea mays | Maize | Release 5a (Nov 2010) | 32,540 | AGI | Science | |
- * Un-published genome data temporarily restricted for downloading (in accordance with the understandings in the Fort Lauderdale meeting and NHGRI policy statement)
- Processed sequences and coordinates for the gene models are available here.
- Phytozome info page contains current repository of plant genome data.
The duplication history of plants in PGDD
*Branch lengths do not represent time or relative amount of character change.
References: Common tree taxonomy tool at NCBI, Document about plant paleopolyploidy at CoGe and Phylogenetic tree of species in Phytozome
Methods
Identify syntenic blocks
We used BLASTP to search for potential anchors (E <1e-5, top 5 matches) between every possible pair of chromosomes in multiple genomes. The homologous pairs are used as the input for MCscan. MCscan is a novel synteny search program that combines the merits of two existent algorithms. The built-in scoring scheme for MCscan is min {-log10E, 50} for every matching gene pairs and -1 for each 10kb distance between anchors, similar to DAGchainer and blocks that have scores >300 were kept. The resulting syntenic chains are evaluated using a procedure in ColinearScan and E-value <1e-10 were used as a significance cutoff.
Calculate synonymous substitutions (Ks)
For homologs inferred from syntenic alignments, we aligned the protein sequences of the gene pairs using CLUSTALW and used the protein alignments to guide CDS alignments by PAL2NAL. Finally, we used Nei-Gojobori method implemented in the PAML package to calculate Ks. An in-house python script is used to pipeline all the calculations. Log-gaussian mixture models are fitted to the Ks distributions using GMM with Bayes Factors.
How can I calculate the significance of segmental duplication
Number of collinear genes: Total number of genes in genome:
Spread in location A: Spread in location B:
How to cite
- Lee, T.H., Tang, H., Wang, X. and Paterson, A.H. (2012) PGDD: a database of gene and genome duplication in plants, Nucleic Acids Research, doi: 10.1093/nar/gks1104. [ Online ]
- Tang, H., Bowers, J.E., Wang, X., Ming, R., Alam, M. and Paterson, A.H. (2008) Synteny and Collinearity in Plant Genomes, Science, 320, 486-488. [ Online ]
- Tang, H., Wang, X., Bowers, J.E., Ming, R., Alam, M. and Paterson, A.H. (2008) Unraveling ancient hexaploidy through multiply-aligned angiosperm gene maps, Genome Research, 18, 1944-1954. [ Online ]


