Hidden Markov Models

In a standard substitution matrix, such as BLOSUM62, the substitution of one amino acid with another is associated with a single score. Database searches can be customized to find specific protein families or domains using substitution scores that reflect the substitution frequencies of each individual amino acid position in a domain. Position-specific scoring matrix or HMMs profiles, in its simplest form, may consist of a set of 20 substitution scores at each position along the motif - one for each of the amino acids. Amino acids that are commonly found at a particular position receive higher scores and amino acids unlikely to appear at that position receive lower scores.

The HMM is used to statistically describe a protein family's consensus sequence. This statistical description can be used for sensitive and selective database searching.
The model consists of a linear sequence of nodes with a begin state (B) and an end state (E). Each node in an HMM has a match state (M), insert state (I) and delete state (D) with position-specific probabilities for transitioning into each of these states from the previous node.
In addition to a transition probability, the match state also has position-specific probabilities for seeing a particular residue. Similarly, the insert state has probabilities for inserting a residue at the position given by the node. There is also a chance that no residue is associated with a node. That probability is indicated by the probability of transitioning to the delete state.
Both transition and emission probabilities can be generated from a multiple alignment of a family of sequences.

An HMM can be compared (that is, aligned) with a new sequence to determine the probability that the sequence belongs to the modelled family. The most probable path through the HMM (i.e., which transitions were taken and which residues were emitted at match and insert states) taken to generate a sequence similar to the new sequence determines the similarity score.

Unlike traditional pair-wise alignment algorithms such as BLAST, FastA, and Smith-Waterman, HMMs use position-specific scoring for the matching or substitution of a residue and for the opening or extension of a gap. Position-specific scoring allows HMMs to characterize entire families of sequences by modelling the extent to which the regions should be conserved in a multiple alignment.
As a result, distantly related proteins can be found even with low sequence identity, if the similarities and differences are common to the family members. This type of analysis is quite powerful because the function of divergent proteins is conserved through evolution even though sequence elements are free to change in some areas.


References:
Andreas D. Baxevanis, B. F. Francis Ouellette. " Bioinformatics - A practical Guide to the Analysis of Genes and Proteins, Wiley-Interscience
Paracel GeneMatcher documentation

Classic papers and books:
Baldi, P., Y. Chauvin, T. Hunkapillar, and M.A. McClure. "Hidden Markov models of biological primary sequence information," Proc. Natl. Acad. Sci., Vol. 91, pp. 1059-1063, 1994.
Durbin, R., S. Eddy, A. Krogh, G. Mitchison. Biological Sequence Analysis. Cambridge U. Press, 1998.
Krogh, A., M. Brown, I.S. Mian, K. Sjolander, and D. Haussler. "Hidden Markov Models in Computational Biology: Applications to Protein Modeling," J. Mol. Biol., Vol. 235, pp.1501-1531, 1994.