|
Erik Sandelin Extracting Multiple Structural Alignments from Pairwise Alignments: A Comparison of a Rigorous and a Heuristic Approach Abstract: Motivation: Multiple Structural Alignments (MSTAs) provide position-specific information on the sequence variability allowed by protein folds. This information can be exploited to better understand the evolution of proteins and the physical chemistry of polypeptide folding. Most MSTA methods relies on a pre-computed library of pairwise alignments. This library will in general contain conflicting residue equivalences which not all can be realized in the final MSTA. Hence to build a consistent MSTA these methods have to select a conflict-free subset of equivalences. Results: Using a dataset with 327 families from SCOP 1.63 we compare the ability of two different methods to select an optimal conflict-free subset of equivalences. One is an implementation of Reinert et al.'s integer linear programming formulation (ILP) of the maximum weight trace problem (Reinert et al., 1997). This ILP formulation is a rigorous approach but its complexity is difficult to predict. The other method is T-Coffee which uses a heuristic enhancement of the equivalence weights which allow it to use the speed and simplicity of the progressive alignment approach while still incorporating information of all alignments in each step of building the MSTA. We find that although the ILP formulation consistently selects a more optimal set of conflict-free equivalences, the differences are small and the quality of the resulting MSTAs are essentially the same for both methods. Furthermore, when applied to three folds used in a recent systematic comparison of MSTA methods, T-Coffee is found to produce results comparable to that of the top performing methods in the comparison. Given its speed and predictable complexity, our results show that T-Coffee is an attractive alternative for producing high-quality MSTAs. |