2010

Background: Even a single amino acid substitution in a protein sequence may result in significant changes in protein stability, structure, and therefore in protein function as well. In the post-genomic era, computatiol methods for predicting stability changes from only the sequence of a protein are of importance. While evolutiory relationships of protein mutations can be extracted from large protein databases holding millions of protein sequences, relevant evolutiory features for the prediction of stability changes have not been proposed. Also, the use of predicted structural features in situations when a protein structure is not available has not been explored. Results: We proposed a number of evolutiory and predicted structural features for the prediction of stability changes and alysed which of them capture the determints of protein stability the best. We trained and evaluated our machine learning method on a non-redundant data set of experimentally measured stability changes. When only the direction of the stability change was predicted, we found that the best performance improvement can be achieved by the combition of the evolutiory features mutation likelihood and Sift score in conjunction with the predicted structural feature secondary structure. The same two evolutiory features in the combition with the predicted structural feature accessible surface area achieved the lowest error when the prediction of actual values of stability changes was assessed. Compared to similar studies, our method achieved improvements in prediction performance. Conclusion: Although the strongest feature for the prediction of stability changes appears to be the vector of amino acid identities in the sequential neighbourhood of the mutation, the most relevant combition of evolutiory and predicted structural features further improves prediction performance. Even the predicted structural features, which did not perform well on their own, turn out to be beneficial when appropriately combined with evolutiory features. We conclude that a high prediction accuracy can be achieved knowing only the sequence of a protein when the right combition of both structural and evolutiory features is used.