Method based on data dividing and integration for predicting signal peptides

Computer Engineering and Applications ›› 2012, Vol. 48 ›› Issue (36): 238-244.

Previous Articles Next Articles

Method based on data dividing and integration for predicting signal peptides

WANG Yi1，2, GUO Gongde1，2, KONG Xiangzeng1，2

1.School of Mathematics and Computer Science, Fujian Normal University, Fuzhou 350007, China
2.Key Lab of Network Security and Cryptography, Fujian Normal University, Fuzhou 350007, China

Online:2012-12-21 Published:2012-12-21

基于数据划分和集成的方法预测信号肽

王怡1，2，郭躬德1，2，孔祥增1，2

1.福建师范大学数学与计算机科学学院，福州 350007
2.福建师范大学网络安全与密码技术重点实验室，福州 350007

Abstract

Abstract: As the length of signal peptide sequence is different and the composition of amino acid is diversified, most of existing methods in literature for signal peptides prediction employ scaling windows to deal with these problems, which lead to potential loss of useful information and imbalanced data problem. In order to improve the prediction performance of the class with minority samples, data preprocessing is used before employing traditional probabilistic neural networks to build classifiers: the class with majority samples is divided into several groups, and then several data subsets are respectively constituted by combining each group with minority samples, which are used to train probabilistic neural network classifiers. The ensemble system finally combines results through ballot from a series of classifiers worked on two different coding of proteins sequences. The experiments carried out on the popular Neilsen dataset show the effectiveness of the proposed algorithm.

Key words: signal peptides prediction, imbalanced data sets, clustering dividing, probabilistic neural networks, multiple classifiers combination

摘要： 在信号肽预测问题中，由于信号肽序列长度不等且氨基酸组成具有多样性的特点，以往方法通常采用滑动窗口进行处理，从而导致了信息丢失以及数据不平衡等问题。为改善少数类预测效果，对训练数据进行了预处理，将多数类样本数据划分，生成的各组样本分别与少数类样本合并组成若干个数据子集，在两种蛋白质编码方案下采用概率神经网络建立多个分类器，采用加权投票将多分类器集成的方法预测信号肽。在目前广泛使用的Neilsen数据集上进行实验，表明该方法具有一定的有效性。

关键词: 信号肽预测, 不平衡数据集, 聚类划分, 概率神经网络, 多分类器融合

WANG Yi1，2, GUO Gongde1，2, KONG Xiangzeng1，2. Method based on data dividing and integration for predicting signal peptides[J]. Computer Engineering and Applications, 2012, 48(36): 238-244.

王怡1，2，郭躬德1，2，孔祥增1，2. 基于数据划分和集成的方法预测信号肽[J]. 计算机工程与应用, 2012, 48(36): 238-244.

[1]	YAN Jianhong. Optimization boosting classification based on metrics of imbalanced data [J]. Computer Engineering and Applications, 2018, 54(21): 128-132.
[2]	LI Dan, HU Xiaoguang. Redundant fault-tolerant system based on VxWorks and fault diagnosis of PNN [J]. Computer Engineering and Applications, 2016, 52(15): 13-18.
[3]	SHANG Li1, CUI Ming1, DU Jixiang2，3. Palmprint recognition methods using non-negative matrix factorization and RBPNN model [J]. Computer Engineering and Applications, 2012, 48(4): 199-203.
[4]	SUN Xiaoyan，ZHANG Huaxiang，JI Hua. Improved KNN algorithm in classification of imbalanced data sets [J]. Computer Engineering and Applications, 2011, 47(28): 143-145.
[5]	WANG Chunyu，SU Hongye，QU Yu，CHU Jian. Imbalanced data sets classification method based on over-sampling technique [J]. Computer Engineering and Applications, 2011, 47(1): 139-143.
[6]	LIU Tian-yu¹，LI Guo-zheng². Research on imbalanced problems in gear fault diagnosis [J]. Computer Engineering and Applications, 2010, 46(20): 146-148.
[7]	OU Ji-shun¹，ZHU Yu-quan¹，CHEN Geng²，LIU Sheng¹. Research on combined classifier algorithm based on cascade [J]. Computer Engineering and Applications, 2009, 45(31): 165-167.
[8]	HU Ke-you,HE Jing,JIAO Li-peng. Image semantic classification based on multi-features [J]. Computer Engineering and Applications, 2008, 44(20): 181-184.
[9]	ZHANG Chun-fen¹,ZHU Yu-quan¹,CHEN Geng²,WANG Min¹. Research on medical image classification based on Cascade combined classifiers [J]. Computer Engineering and Applications, 2007, 43(36): 211-213.
[10]	GUO Hong-xia¹，WANG Bing-he¹，ZHENG Si-yi¹，SHI Yi-ming². Recognition method of traditional Chinese medicine pulse conditions based on probabilistic neural network [J]. Computer Engineering and Applications, 2007, 43(20): 194-196.
[11]	YU Ying^1，2，YANG Yang¹，DONG Cai-lin³，HE Xiu-ling^1，3，CHEN Zeng-zhao^1，3. Off-line ｈandwritten ａmount Chinese ｃharacters ｒecognition ｂased ｏn ｍultiple ｃlassifiers ｃombination [J]. Computer Engineering and Applications, 2007, 43(15): 212-214.

Method based on data dividing and integration for predicting signal peptides

基于数据划分和集成的方法预测信号肽

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 11

Recommended Articles

Metrics