|advertisement: compare things at compare-stuff.com!|
The CATH database from this laboratory[Orengo et al., 1997] is a more ambitious project. Using semi-automated methods a set of domains from the single- and multi-domain structures of the PDB is hierarchically classified at four levels: Class, Architecture, Topology and Homologous superfamily. These levels are described both by names (for ease of human understanding) and numbers (for easy computer manipulation). In this latter form CATH is similar to the EC enzyme classification scheme[NC-IUBMB, 1992].
Class has already been described and can be one of: mainly-, mainly-, mixed- and irregular. The unique feature of CATH is the architectural description at the next level in the hierarchy. Many of the folds (in the mainly- and mixed- classes in particular) appear to be constructed according to the same basic principles. For example a sizeable subset of mixed- folds can be thought of as having three layers: two of helix surrounding a single central layer of -sheet; this is the 3-layer sandwich architecture. Architectures are defined manually for the whole of fold space. Folds within the same architectural subdivision may have different numbers and ordered connections of secondary structures, and are discriminated by the topology descriptor. The final level groups all domains belonging to the same homologous superfamily. These are structures which are clearly or weakly related by sequence but have the same function and are most likely evolutionarily related.
The topology and homology classifications of CATH are performed by the automated application of the SSAP algorithm and sequence comparisons. The numerical cutoffs are described in full on the CATH web site. In certain dense regions of fold space (such as in the 3-layer sandwich architecture) more strict thresholds were required to subdivide topologies into smaller and more informative groupings. Only through consultation with users and a deeper understanding of fold evolution will classifications of protein structures be completely `error-free'.