Dating Papaya duplication

From PGML

Jump to: navigation, search

There are three events relevant to papaya lineage which relative chronological order still remains unclear: arabidopsis α duplication, arabidopsis-papaya divergence, papaya recent duplication (if there is one). This study addresses the question by using Brad's dating method in Nature paper, more specifically, the relationship between α duplication and arabidopsis-papaya divergence. The order between papaya duplication (if exists) and arabidopsis-papaya divergence, however, cannot be faithfully determined without collinearity data.

Dataset

The starting dataset to use is Arabidopsis α duplicates and β duplicates, and used as BLAST query. Unigene datasets from multiple related angiosperm species are downloaded from TIGR plantta database, and formatted as BLAST databases. Details of taxa and released versions used are listed below:

  1. Oryza sativa 2006-06-05 Release 2 Assemblies: 49870 Singletons: 197646 (NCBI Taxon ID: 4530)(To be used as an OUTGROUP)
  2. Populus trichocarpa2006-06-05 Release 2 Assemblies: 12687 Singletons: 18395 (NCBI Taxon ID: 3694)
  3. Glycine max2006-09-28 Release 2 Assemblies: 36399 Singletons: 80566 (NCBI Taxon ID: 3847)
  4. Brassica napus2006-06-05 Release 2 Assemblies: 10709 Singletons: 24751 (NCBI Taxon ID: 3708)
  5. Solanum tuberosum2006-06-05 Release 2 Assemblies: 26280 Singletons: 54792 (NCBI Taxon ID: 4113)
  6. Gossypium hirsutum2006-09-28 Release 3 Assemblies: 24797 Singletons: 45870 (NCBI Taxon ID: 3635)
  7. Carica papaya 8571-papaya-unigenes.fasta 2006-10-11 Release Assemblies: 4267 Singletons: 4304 (PGML Lab Data)

To download the arabidopsis query.

Procedure

  1. BLAST arabidopsis duplicates to unigene sets for multiple organisms.
  2. Identify the best homologue to any duplicate in an organism.
  3. Identify the best homologue in outgroup species.
  4. Doing multiple alignments on the four genes.
  5. Produce phylogenetic trees on the four genes.
  6. For all duplicates, determine the proportion of internal trees.

A detailed description, intermediate files etc. can be found in Methods of dating papaya duplication.

Results

+-----------+------------------------------+------------------+
| species   | proportion of internal trees | total # of trees |
+-----------+------------------------------+------------------+
| Brassica  |                       0.7826 |              391 | 
| Glycine   |                       0.2515 |              330 | 
| Gossypium |                       0.2991 |              341 | 
| Papaya    |                       0.2755 |               98 | 
| Populus   |                       0.3139 |              274 | 
| Solanum   |                       0.2166 |              314 | 
+-----------+------------------------------+------------------+

One can still apply relevant statistics (ANOVA) here for multiple segments, but just by eyeballing, the trend of papaya is very much similar to other pre-α species, having a much lower percentage of internal trees than Brassica. By comparing this result to previous result, one may notice 20% increase proportion of internal trees. This may be due to a procedure I took to remove false orthologs, in other words, I removed some external trees by forcing a threshold. Intuitively, if one fails to identify the correct ortholog, it would bias towards an external tree.

Although numbers disagree, pattern still holds. Assume that papaya clock isn't incredibly slow, it should have diverged from arabidopsis before α duplication.


Back to Main Page


--Bao 16:26, 10 April 2007 (EDT)

Personal tools