|advertisement: compare things at compare-stuff.com!|
The scores for the alignments (see below) between the query sequence and each of the 82 library sequences are sorted and then normalised by the mean and standard deviation (of the same 82 scores), to give the so-called Z-score. Ideally, correctly identified folds would rank highest and their alignments would have Z-scores greater than, say, 3.0. The fold recognition trials could be assessed by the number of correct vs. incorrect predictions above such a threshold. However, using the different methodologies presented in this and the following chapter, we observed wide variations in the distributions of alignment scores. A single Z-threshold cannot be used to give equivalent numbers of predictions. Furthermore, the numbers of such predictions are small, leading to discretised and noise prone success rates. Instead, a more continuous and robust measure was employed which was not dependent upon a threshold. The important issue of prediction confidence is discussed in Section 4.4.4 following the refinement of the methods.
For a query sequence which has recognisable non-self folds in the library, the mean rank (in the list of library folds sorted by alignment score) of the correct folds is simply calculated as , where is the rank of recognisable fold . However, values of are dependent on : a perfect ranking when yields , whilst a perfect ranking when yields . Both predictions should ideally score the same. Therefore is divided by the best possible score () to give the mean adjusted rank, , which equals 1.0 for a perfect prediction, and is higher for worse predictions.
In order to compare results with other methods[Rost et al., 1997, for example], we also calculate the number of correct top ranking predictions, , (excluding self predictions). For folds where there can only be one correct top ranking prediction, hence the maximum possible value of is 27, the number of recognisable queries.