PGDD is a public database to identify and catalog plant genes in terms of intragenome or cross-genome syntenic relationships. Current work focus on plants with available whole genome sequences (preferrably assembled pseudomolecules with ordered gene models). A detailed description of this database can be found here.

Data source

Plant genomes used in this database
Species name Common name Release version Gene number Access
A. thaliana thale cress TAIR 7.0 (Aug. 2007) 26784 TAIR FTP
C. papaya papaya EVM (Jul. 2007) 25536 restricted
P. trichocarpa poplar JGI 1.1 (Dec. 2004) 45554 JGI HTTP
V. vinifera grape Genoscope (Aug. 2007) 37829 Genoscope HTTP
O. sativa (ssp. japonica) rice RAP 2.0 (Nov. 2007) 30192 RAP HTTP
S. bicolor * sorghum JGI 1.4 (Dec. 2007) 34496 JGI HTTP

* Un-published genome data therefore temporarily restricted

Methods

Syntenic blocks

We used BLASTP to search for potential anchors (E <1e-5, top 5 matches) between every possible pair of chromosomes in multiple genomes. The homologous pairs are used as the input for MCscan. MCscan is a novel synteny search program that combines the merits of two existent algorithms. The built-in scoring scheme for MCscan is min {-log10E, 50} for every matching gene pairs and -1 for each 10kb distance between anchors, similar to DAGchainer and blocks that have scores >300 were kept. The resulting syntenic chains are evaluated using a procedure by ColinearScan and E-value <1e-10 were used as a significance cutoff.

Ks calculations

For homologs inferred from syntenic alignments, we aligned the protein sequences of the gene pairs using CLUSTALW and used the protein alignments to guide CDS alignments by PAL2NAL. Finally, we used Nei-Gojobori method implemented in the PAML package to calculate Ks.



Page last updated: Mar. 01, 2008