Protein-encoding gene families for a new version of TAED are now available. The entire dataset contains 6672 families derived from the January, 2002 release of Chordate genes in Genbank (6657 are fully processed and currently available). These families contain genes linked through single linkage clustering within a maximum of 100 PAM units and 90% length (smaller/bigger) for complete genes. Each family to be included must contain at least 2 complete genes and must contain genes from at least 2 chordate species.
For each family, multiple sequence alignments were calculated using the probablistic model in Darwin, Clustal w, and POA. Phylogenetic trees were built using DNAdist (100 bootstrap replicates), DNAml (families up to 50 sequences), and Protdist (100 bootstrap replicates) from Phylip. Additional analysis was done using Mr. Bayes. Consensus trees were built and rooted using the NCBI phylogeny to minimize duplication and loss events using a soft parsimony consensus method.
Redundant sequences have been removed by the removal of all inparalogues. This has removed all recent lineage-specific gene duplication events as well, which can ultimately be added back when confirming information is obtained through genome sequencing.
This release is available at http://www.cbu.uib.no/fuge/services/TAED/taed2003.zip. Please note that no evolutionary calculations have been done on this release, and these have now been replaced by the latest version.
The latest version of TAED is now available through http://www.bioinfo.no/tools/TAED.
The previous calculations of this database derived from The Master Catalog can be visualized through the links below. Any feedback or questions are also welcome.
April, 2003 (and the Norwegian version link added August, 2004)