|advertisement: compare things at compare-stuff.com!|
The detection of weak similarities has been improved to some extent by the use of multiple sequence information. Multiple sequence alignments give information about the types of amino acid preferred at each sequence position (profiles). Single query sequences can be searched against databases of aligned homologues such as BLOCKS[Henikoff & Henikoff, 1994], PRINTS[Attwood et al., 1997], PRODOM[Sonnhammer & Kahn, 1994], PROFILES[Gribskov et al., 1987], or PROSITE patterns[Bairoch et al., 1997] derived from conserved functional residues in multiple alignments. Similarly, multiple alignments of homologous sequences with unknown function may be used to search a sequence database[Barton, 1990,Krogh et al., 1994, for example].
In general these methods replace the amino acid substitution matrix used at all sequence positions with `position specific scoring matrices' (PSSMs) whose scores are calculated from a single column in a multiple sequence alignment. In most cases these methods perform best where the alignment is most certain (in the more conserved regions). BLOCKS and PRINTS only scan query sequences against conserved ungapped blocks of multiple alignments. It has been shown recently that simple embedding of consensus sequences from conserved regions of a multiple sequence alignment into a single representative sequence improves BLAST and FASTA searches, and outperforms PSSM based methods[Henikoff & Henikoff, 1997]. In order to align whole sequences, gap penalties can also be calculated on a position specific basis[Gribskov et al., 1990] and incorporated into PSSMs. Hidden Markov models (HMMs) similarly deal with position specific substitutions and gap penalties in the alignment of multiple sequences[Krogh et al., 1994,Eddy, 1996].