|advertisement: compare things at compare-stuff.com!|
We hope to show that the new method developed here performs better than current methods, both threading and sequence based. Starting with sequence based methods, the best control experiment is to apply the Smith Waterman[Smith & Waterman, 1981] local sequence alignment search program, SSEARCH, which is part of the FASTA package[Pearson, 1990]. Unlike FASTA, which initially screens sequences using a fast but approximate method for scoring diagonals in the alignment matrix, SSEARCH performs complete local alignments for every comparison. Whilst SSEARCH is only a single sequence method, we improved its chances of success by creating a sequence library from the multiple sequence homologues of the fold library domains (1421 sequences in total). We also used multiple query sequences: for each query domain, each sequence homologue was scanned against all 1421 sequences (except those derived from the query), and the resulting local alignment scores were ranked and the top alignment selected. Gap penalties of 12 (opening) and 2 (extension) were used with the BLOSUM50 matrix, according to the recommendations in the literature[Henikoff, 1996,Pearson, 1995].
Using this protocol, the number of correct non-self top-ranking folds, , was 7. Details of these hits are presented in Table 4.2. Mean adjusted rank and alignment shifts were not calculated. It is clear from these results that the dataset is not strictly non-homologous. Whilst none of the pairs of domains are more than 20% identical by global alignment methods, some are clearly detectable using standard local alignment methods with a small library of similarly sized domain sequences. Six of the seven hits are pairs which recognise each other. Each of these pairs share similar functions either by E.C. number or SCOP classification. 2ohx and 1gdh are both oxidoreductases acting on the CH-OH group of donors with NAD or NADP as the acceptor. 1tpf and 1pii are both isomerases which interconvert aldoses and ketoses. 5p21 and 1hur are both in the G-protein family of SCOP. These two pairs of domains are placed in the same homologous superfamilies in the latest release of CATH. The recognition of 1exg00 by 1cgt04 is not so easily explained since in SCOP they have different fold classifications, yet 1exg is in the ``carbohydrate-binding domain'' superfamily, and the equivalent SCOP domain for 1cgt04 is in the ``Starch-binding domain'' superfamily. In CATH, both domains belong to the immunoglobulin-like topology. These domains may be more related than SCOP suggests.