|advertisement: compare things at compare-stuff.com!|
As the technology of DNA sequence determination continues to advance, the amount of available protein sequence information grows rapidly. Novel sequences result both from directed research into particular systems, and wholesale genome and chromosome sequencing projects. In the former case, the function of novel proteins can often be deduced from the experimental context and/or careful application of sequence database searches to identify similar sequences with known function. Large-scale sequencing projects produce too much data for this kind of approach. Hence automated methods are required to perform database searches and sequence annotation in the absence of background knowledge and common-sense reasoning. Accuracy and coverage are the critical performance issues for such methods.