|advertisement: compare things at compare-stuff.com!|
The hierarchical route to the prediction of tertiary structure starts with the prediction of secondary structure elements (see below) and/or secondary structural class. Secondary structural content can be estimated experimentally via UV circular dichroism spectroscopy[Woody, 1995], or secondary structure predictions, from which the class can be derived[Rost & Sander, 1994,Eisenhaber et al., 1996b]. Methods which predict secondary structural class directly are of particular interest because their output might enhance the quality of linear predictions if the information used by the two methods is independent[Rost & Sander, 1994]. Nishikawa and coworkers [Nishikawa & Ooi, 1982,Nishikawa et al., 1983,Nakashima et al., 1986] noted some time ago that the amino acid composition of a protein is correlated with structural class. By representing proteins in 20 dimensional amino acid composition space, predictions of structural class (usually 3, 4 or 5 classes) have been made using neural networks[Muskal & Kim, 1992,Metfessel et al., 1993,Rost & Sander, 1994] and a myriad of variations on distance measures and multivariate analysis [Nakashima et al., 1986,Chou, 1989,Metfessel et al., 1993,Klein & Delisi, 1986a,Chou & Zhang, 1995,Boberg et al., 1995,Eisenhaber et al., 1996b].
In general the reported accuracy of these methods is around 70-80% using differently constructed datasets with varying degrees of cross-validation. Using weighted vector components or methods that take into account correlations between amino acid frequencies (component coupled) Zhang and coworkers[Zhou et al., 1992,Chou & Zhang, 1994,Chou & Zhang, 1995] have reported accuracies at or very close to 100%, however the authors did not appreciate the memorisation effects of these parameter rich methods and neglected to perform adequate cross-validation. Eisenhaber et al.eisenhaber:sscp1,eisenhaber:sscp2 have used a vector decomposition method to predict secondary structural content and class. Their studies showed that dataset size, class definition and cross-validation were of critical importance to the true accuracy of class prediction from amino acid composition, which they estimated at 60%. They showed also that component coupling techniques give only a small improvement in cross-validated prediction accuracy.