收录:
摘要:
The mechanism of how protein amino acid sequences determine protein structure is a core issue in biology. The protein fold type reflects the topological pattern of the structure's core. Fold recognition is an important method in protein sequence-structure research. This article focuses on the 36 fold types that are not incorporated into the unified hidden Markov model (HMM) model but that account for 41.8% of alpha, beta, and alpha/beta protein's in the Astral 1.65 sequence database. The training set contains samples that have less than 25% sequence identity with each other. We applied the hierarchical clustering method according to root mean square deviation (RMSD) and fold subgroups were generated. A profile-HMM based on a multiple structural alignment algorithm (MUSTANG) structure alignment was then built for each subgroup. After testing 9505 proteins with less than 95% sequence identity from the Astral 1.65 database, the average sensitivity, specificity and Matthew's correlation coefficient (MCC) of the 36 fold types were found to be 90%, 99% and 0.95, respectively. These results show that classification modeling according to RMSD is able to achieve precise fold recognition while a unified HMM cannot be built because there are too many elements in the training set. We have developed a new method and novel ideas to enable profile-HMM protein fold recognition and have laid the foundation for further research.
关键词:
通讯作者信息:
电子邮件地址:
来源 :
ACTA PHYSICO-CHIMICA SINICA
ISSN: 1000-6818
年份: 2009
期: 12
卷: 25
页码: 2558-2564
1 0 . 9 0 0
JCR@2022
ESI学科: CHEMISTRY;
JCR分区:4
中科院分区:1
归属院系: