Computer Engineering and Applications ›› 2010, Vol. 46 ›› Issue (9): 23-25.DOI: 10.3778/j.issn.1002-8331.2010.09.008

• 博士论坛 • Previous Articles     Next Articles

Hybrid algorithm for classification based on VPRS

HONG Zhi-yong1,2,QIN Ke-yun1,DENG Wei-bin3   

  1. 1.School of Mathematics,Southwest Jiaotong University,Chengdu 610031,China
    2.School of Computer Science,Wuyi University,Jiangmen,Guangdong 529020,China
    3.School of Information Science & Technology,Southwest Jiaotong University,Chengdu 610031,China
  • Received:2009-12-22 Revised:2010-02-06 Online:2010-03-21 Published:2010-03-21
  • Contact: HONG Zhi-yong

基于VPRS理论的一种混合分类算法

洪智勇1,2,秦克云1,邓维斌3   

  1. 1.西南交通大学 数学学院,成都 610031
    2.五邑大学 计算机学院,广东 江门 529020
    3.西南交通大学 信息与科学技术学院,成都 610031
  • 通讯作者: 洪智勇

Abstract: In text classification community,K-Nearest Neighbor(KNN) and Support Vector Machine(SVM) are all effective classifiers.But both of them have their drawbacks.KNN has a high cost to classify new document when training set is large.SVM is too sensitive to the noise when the data is close to hyperplane it suffers.So one hybrid algorithm based on VPRS is proposed.It combines the strength of both KNN and SVM techniques and overcomes their weaknesses.Finally some experiments are carried out to compare the efficiency and classification with different classification algorithms.Results show that the proposed method achieves significant performance improvement.

Key words: text classification, Support Vector Machine(SVM), K-Nearest Neighbor(KNN), Variable Precision Rough Set model(VPRS)

摘要: 在文本分类领域中,KNN与SVM算法都具有较高的分类准确率,但两者都有其内在的缺点,KNN算法会因为大量的训练样本而导致计算量过大;SVM算法对于噪声数据过于敏感,对分布在分类超平面附近的数据点无法进行准确的分类,基于此提出一种基于变精度粗糙集理论的混合分类算法,该算法能够充分利用二者的优势同时又能克服二者的弱点,最后通过实验证明混合算法能够有效改善计算复杂度与分类精度。

关键词: 文本分类, 支持向量机(SVM)算法, K-近邻法(KNN), 变精度粗糙集模型(VPRS)

CLC Number: