2010

Background: Reliable prediction of stability changes in protein variants is an important aspect of computatiol protein design. A number of machine learning methods that allow a classification of stability changes knowing only the sequence of the protein emerged. However, their performance on amino acid substitutions of previously unseen non-homologous proteins is rather limited. Moreover, the performance varies for different types of mutations based on the secondary structure or accessible surface area of the mutation site. Results: We proposed feature-based multiple models with each model designed for a specific type of mutations. The new method is composed of five models trained for mutations in exposed, buried, helical, sheet, and coil residues. The classification of a mutation as stabilising or destabilising is made as a consensus of two models, one selected based on the predicted accessible surface area and the other based on the predicted secondary structure of the mutation site. We refer to our new method as Evolutiory, Amino acid, and Structural Encodings with Multiple Models (EASE-MM). Cross-validation results show that EASE-MM provides a notable improvement to our previous work reaching a Matthews correlation coefficient of 0.44. EASE-MM was able to correctly classify 73% and 75% of stabilising and destabilising protein variants, respectively. Using an independent test set of 238 mutations, we confirmed our results in a comparison with related work. Conclusions: EASE-MM not only outperformed other related methods but achieved more balanced results for different types of mutations based on the accessible surface area, secondary structure, or magnitude of stability changes. This can be attributed to using multiple models with the most relevant features selected for the given type of mutations. Therefore, our results support the presumption that different interactions govern stability changes in the exposed and buried residues or in residues with a different secondary structure.