Computer Engineering and Applications ›› 2011, Vol. 47 ›› Issue (5): 118-122.

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Design and implementation on Web clustering system based on Nutch

YANG Xiaolan,QIAN Cheng,ZHAO Haiting   

  1. College of Information Engineering,Wuhan University of Science and Technology Zhongnan Branch,Wuhan 430223,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-02-11 Published:2011-02-11

一种基于Nutch的网页聚类系统的设计与实现

阳小兰,钱 程,赵海廷   

  1. 武汉科技大学中南分校 信息工程学院,武汉 430223

Abstract: A search results clustering system which can be able to search cluster results obtained from Nutch is designed both in English and Chinese language environment.This system is based on k-means algorithm and suffix tree clustering algorithm and is made of Nutch module,TF-IDF weight calculation module and text clustering module.The k-means algorithm and suffix tree clustering algorithm are contrasted based on the experiments.

Key words: Nutch, clustering, k-means, suffix tree

摘要: 设计了一种在中英文环境下、能够对Nutch的搜索结果进行聚类处理的搜索结果聚类系统,该系统基于k-means算法和后缀树聚类算法,是一个由Nutch搜索引擎、文本分词、TF-IDF权重计算以及文本聚类等模块构成的搜索引擎结果文档聚类系统,并通过实验对k-means算法和后缀树算法进行了对比。

关键词: Nutch, 聚类, k-means, 后缀树