PGDD is a public database to identify and catalog plant genes in terms of intragenome or cross-genome syntenic relationships. Current efforts focus on flowering plants with available whole genome sequences (preferrably assembled pseudomolecules with ordered gene models).

Data sources

Plant genomes in this database (24 genomes)
Species name Common name Release version Gene number Access Reference
Arabidopsis lyrata Lyrate rockcress Version 1.0 (Apr 2011)
32670 JGI Nature Genetics
Arabidopsis thaliana Arabidopis TAIR 9.0 (Jun 2009) 27,379 TAIR Nature
Brachypodium distachyon Purple false brome Phytozome v6.0 32,255 JGI Nature
Brassica rapa Chinese cabbage Version 1.1
22,285 BRAD Nature Genetics
Cajanus cajan Pigeonpea Nov 2011 48,680 IIPG Nature Biotechnology
Carica papaya Papaya Dec 2007 25,536 Hawaii Nature
Chlamydomonas reinhardtii Green algae Version 4.2
16,036
JGI Science
Cucumis sativus Cucumber Phytozome v6.0
21,491
JGI
Nature Genetics
Fragaria vesca Strawberry Dec 2010
34,809
PFR Nature Genetics
Glycine max Soybean Release 1 (Dec 2008) 66,153
JGI Nature
Lotus japonicus Lotus Release 2.5 42,399 Kazusa DNA research
Malus x domestica Apple Aug 2010
57,386
IASMA Nature Genetics
Medicago truncatula Barrel medic Mt3.5 v3 (Jun 2011)
45,108
JCVI Nature
Oryza sativa Rice RAP 2.0 (Nov 2007) 30,192 RAP Nature
Physcomitrella patens Moss Version 1.6 (Jan 2008)
32,272 JGI Science
Prunus persica * Peach Version 1.0
27,864 JGI -
Populus trichocarpa Western poplar JGI 2.0 (Feb 2010) 45,778 JGI Science
Ricinus communis
Castor bean Release 0.1 (May 2008) 38,613 JCVI Nature Biotechnology
Sorghum bicolor Sorghum Sbi 1.4 (Dec 2007) 34,496 JGI Nature
Solanum tuberosum
Potato Version 3.4
39,031 PGSC Nature
Selaginella moellendorffii Selaginella Version 1.0 (Dec 2007)
22,273 JGI Science
Theobroma cacao Cacao Release 0.9 (Sep 2010) 28,798
CIRAD Nature Genetics
Vitis vinifera Grape vine Genoscope (Aug 2007) 26,346
Genoscope Nature
Zea mays Maize Release 5a (Nov 2010) 32,540 AGI Science
In queue (5 genome)
Aquilegia coerulea *
Colorado blue columbine Phytozome v6.0 25784 JGI --
Mimulus guttatus * Monkey flower Version 1.1
26718
JGI --
Setaria italica *
Foxtail millet Version 1.1
32095 JGI --
Volvox carteri *
Volvox Phytozome v6.0 14491 JGI --
Theobroma cacao * Cacao Release 0.9 (Sep 2010) 34997 MARS-USDA --

The duplication history of major angiosperm taxa
lavender circles represent inferred polyploidy events, drawn roughly to scale


Methods

Identify syntenic blocks

We used BLASTP to search for potential anchors (E <1e-5, top 5 matches) between every possible pair of chromosomes in multiple genomes. The homologous pairs are used as the input for MCscan. MCscan is a novel synteny search program that combines the merits of two existent algorithms. The built-in scoring scheme for MCscan is min {-log10E, 50} for every matching gene pairs and -1 for each 10kb distance between anchors, similar to DAGchainer and blocks that have scores >300 were kept. The resulting syntenic chains are evaluated using a procedure in ColinearScan and E-value <1e-10 were used as a significance cutoff.

Calculate synonymous substitutions (Ks)

For homologs inferred from syntenic alignments, we aligned the protein sequences of the gene pairs using CLUSTALW and used the protein alignments to guide CDS alignments by PAL2NAL. Finally, we used Nei-Gojobori method implemented in the PAML package to calculate Ks. An in-house python script is used to pipeline all the calculations. Log-gaussian mixture models are fitted to the Ks distributions using GMM with Bayes Factors.

How can I calculate the significance of segmental duplication

Number of collinear genes: Total number of genes in genome:
Spread in location A: Spread in location B:


How to cite