My watch list
my.bionity.com  
Login  

A similarity distance of diversity measure for discriminating mesophilic and thermophilic proteins

The successful prediction of thermophilic proteins is useful for designing stable enzymes that are functional at high temperature. We have used the increment of diversity (ID), a novel amino acid composition-based similarity distance, in a 2-class K-nearest neighbor classifier to classify thermophilic and mesophilic proteins. And the KNN-ID classifier was successfully developed to predict the thermophilic proteins. Instead of extracting features from protein sequences as done previously, our approach was based on a diversity measure of symbol sequences. The similarity distance between each pair of protein sequences was first calculated to quantitatively measure the similarity level of one given sequence and the other. The query protein is then determined using the K-nearest neighbor algorithm. Comparisons with multiple recently published methods showed that the KNN-ID proposed in this study outperforms the other methods. The improved predictive performance indicated it is a simple and effective classifier for discriminating thermophilic and mesophilic proteins. At last, the influence of protein length and protein identity on prediction accuracy was discussed further. The prediction model and dataset used in this article can be freely downloaded from http://wlxy.imu.edu.cn/college/biostation/fuwu/KNN-ID/index.htm.

Authors:   Yong-Chun Zuo, Wei Chen, Guo-Liang Fan, Qian-Zhong Li
Journal:   Amino Acids
Year:   2012
DOI:   10.1007/s00726-012-1374-z
Publication date:   01-08-2012

Watchlist

This is where you can add this publication to your personal favourites.

Additional Information

Facts, background information, dossiers
More about Springer-Verlag
Your browser is not current. Microsoft Internet Explorer 6.0 does not support some functions on Chemie.DE