While PGDD is no longer funded and therefore is not being regularly updated, we intend to keep it online for as long as resources permit and interest warrants. Periodic updates may occur but cannot be assured. Note the last update of PGDD was in 2014.

PGDD is a public database to identify and catalog plant genes in terms of intragenome or cross-genome syntenic relationships. Current efforts focus on flowering plants with available whole genome sequences (preferrably assembled pseudomolecules with ordered gene models).

Data sources

Plant genomes in this database (47 genomes)
Species name Common name Version version Gene number Access Reference
Actinidia chinensis Kiwifruit May 2013
32,670 KGD Nature Communications
Amborella trichopoda Amborella Version 1.0 26,846 AGD Science
Arabidopsis lyrata Lyrate rockcress Version 1.0
32,670 JGI Nature Genetics
Arabidopsis thaliana Arabidopsis TAIR10 27,416 TAIR Nature
Brachypodium distachyon Purple false brome Version 2.1 31,694 JGI Nature
Brassica oleracea Kale Version 2.1 59,225 BRAD Genome Biology
Brassica rapa Chinese cabbage Version 1.3
40,492 BRAD Nature Genetics
Beta vulgaris Sugar beet RefBeet-1.1
27,421 BVR Nature
Cajanus cajan Pigeonpea Nov 2011 48,680 IIPG Nature Biotechnology
Capsella rubella Capsella Version 1.0 26,521 JGI Nature Genetics
Capsicum annuum Hot pepper Version 1.55 34,899 PepperGenomeDB Nature Genetics
Carica papaya Papaya ASGPBv0.4 24,782 Hawaii Nature
Chlamydomonas reinhardtii Green algae Version 5.5
JGI Science
Cicer arietinum Chickpea Version 1.0 28,269 LIS Nature Biotechnology
Citrullus lanatus Watermelon Version 1.0 23,440 ICUGI Nature Genetics
Citrus sinensis Sweet orange Version 1.1 25,379 CSAP Nature Genetics
Cucumis sativus Cucumber Version 1.0
Nature Genetics
Eucalyptus grandis Eucalyptus Version 1.1
Elaeis guineensis Oil palm Version 2.0
Fragaria vesca Strawberry Version 1.1
PFR Nature Genetics
Glycine max Soybean Wm82.a2.v1 56,044
JGI Nature
Gossypium raimondii Cotton Version 2.1 37,505
JGI Nature
Hordeum vulgare Barley Version 1.0 16,598
Helmholtz Center Nature
Lotus japonicus Lotus Version 2.5 42,399 Kazusa DNA research
Malus x domestica Apple Version 1.0
IASMA Nature Genetics
Medicago truncatula Barrel medic Mt4.0v1
JCVI Nature
Nelumbo nucifera Sacred lotus Version 1.0
Genome Biology
Musa acuminata Banana Jul 2012 36,542 CIRAD Nature
Oryza sativa Rice Version 7.0 39,049 RAP Nature
Picea abies Norway spruce Version 1.0 66,632 ConGenIE Nature
Pyrus bretschneideri Pear Version 1.0 42,812 PGP Genome Research
Phalaenopsis equestris Orchid Version 5.0 42293 Orchidbase Nature Genetics
Phaseolus vulgaris Common bean Version 1.0 27082 JGI Nature Genetics
Physcomitrella patens Moss Version 3.0
26,610 JGI Science
Populus trichocarpa Western poplar Version 3.0 41,335 JGI Science
Prunus mume Mei Version 1.0
31,390 PMDB Nature Communications
Prunus persica Peach Version 1.0
28,689 JGI Nature Genetics
Ricinus communis Castor bean Version 0.1 38,613 JCVI Nature Biotechnology
Selaginella moellendorffii Selaginella Version 1.0
22,273 JGI Science
Solanum lycopersicum Tomato Version 2.4
34,727 SGN Nature
Solanum tuberosum Potato Version 3.4
39,031 PGSC Nature
Sorghum bicolor Sorghum Version 2.1 33,032 JGI Nature
Theobroma cacao Cacao Version 1.1 29,452
CIRAD Nature Genetics
Triticum urartu Wheat A-genome Version 1.0 34,879
CLIMB Nature
Utricularia gibba Humped bladderwort CoGe (Jun 2013) 28,494
CoGe Nature
Vitis vinifera Grape vine Genoscope (Aug 2007) 26,346
Genoscope Nature
Zea mays Maize Version 6a 63,480 AGI Science


  • 01-14-2015  MCSCAN-toolkit-beta-1.0 is available for download from the "MCSCAN" page.
  • 12-16-2014  Orchid (Phalaenopsis equestris) was added to PGDD.
  • 09-12-2014  Added the following genomes: Brassica oleracea, Beta vulgaris, Capsella rubella, Citrus sinensis, Citrullus lanatus, Capsicum annuum, Elaeis guineensis, Hordeum vulgare, Nelumbo nucifera, Picea abies, Pyrus bretschneideri, Prunus mume, Triticum urartu, .
  • 09-11-2014  Updated the following genomes: Arabidopsis thaliana, Brachypodium distachyon, Brassica rapa, Carica papaya, Chlamydomonas reinhardtii, Fragaria vesca, Glycine max, Malus x domestica, Medicago trunculata, Oryza sativa, Physcomitrella patens, Populus trichocarpa, Sorghum bicolor, Solanum lycopersicum, Theobroma cacao, Zea mays .
  • 06-25-2014  Common bean (Phaseolus vulgaris) was added to PGDD.
  • 06-18-2014  Eucalyptus (Eucalyptus grandis) was added to PGDD.

The duplication history of plants in PGDD

*Branch lengths do not represent time or relative amount of character change. 01 02 03 04 05 07 08 09 11 12 13 14 15 16 17 18 20

References: Common tree taxonomy tool at NCBI, Document about plant paleopolyploidy at CoGe and Phylogenetic tree of species in Phytozome


Identify syntenic blocks

We used BLASTP to search for potential anchors (E <1e-5, top 5 matches) between every possible pair of chromosomes in multiple genomes. The homologous pairs are used as the input for MCscan. MCscan is a novel synteny search program that combines the merits of two existent algorithms. The built-in scoring scheme for MCscan is min {-log10E, 40} for every matching gene pairs and -1 for each 10kb distance between anchors, similar to DAGchainer and blocks that have scores >200 were kept. The resulting syntenic chains are evaluated using a procedure in ColinearScan and E-value <1e-10 were used as a significance cutoff.

Calculate synonymous substitutions (Ks)

For homologs inferred from syntenic alignments, we aligned the protein sequences of the gene pairs using CLUSTALW and used the protein alignments to guide CDS alignments by PAL2NAL. Finally, we used Nei-Gojobori method implemented in the PAML package to calculate Ks. An in-house python script is used to pipeline all the calculations. Log-gaussian mixture models are fitted to the Ks distributions using GMM with Bayes Factors.

How can I calculate the significance of segmental duplication

Number of collinear genes: Total number of genes in genome:
Spread in location A: Spread in location B:

How to cite