计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (25): 155-156.DOI: 10.3778/j.issn.1002-8331.2008.25.047

• 数据库、信号与信息处理 • 上一篇    下一篇

用于WEB文档分类的并行KNN算法

周朴雄   

  1. 华南理工大学 电子商务学院,广州 510006
  • 收稿日期:2007-11-05 修回日期:2008-01-28 出版日期:2008-09-01 发布日期:2008-09-01
  • 通讯作者: 周朴雄

Parallel KNN algorithm for WEB document classification

ZHOU Pu-xiong   

  1. College of E-business,South China University of Technology,Guangzhou 510006,China
  • Received:2007-11-05 Revised:2008-01-28 Online:2008-09-01 Published:2008-09-01
  • Contact: ZHOU Pu-xiong

摘要: 针对WEB文档分类中KNN算法计算复杂度高的缺点,不同于以往从减少训练样本集大小和采用快速算法角度来降低KNN算法的计算复杂度,从并行的角度出发,提出一种在Hyper-cube SIMD模型上的并行算法,其关键部分的时间计算复杂度从O(n2)降为O(log(n)),该算法与传统的串行算法相比,能显著地提高分类速度。

关键词: 文档分类, K最近邻, 并行策略

Abstract: Aim to the lack of the high time complexity in the Web document classification,a parallel KNN algorithm based on the model of Hyper-cube SIMD is proposed.The time complexity of the key part of the KNN decreases from O(n2) to O(log(n))in the algorithm.The classification speed is improved remarkably.

Key words: document classification, K Nearest Neighbor(KNN), parallel algorithm