Computer Engineering and Applications ›› 2014, Vol. 50 ›› Issue (21): 143-146.
Previous Articles Next Articles
YUAN Jinsheng, RONG Yuanyuan
Online:
Published:
袁津生,荣元媛
Abstract: The search result clustering can help users quickly find the information needed. This paper focuses on Chinese text clustering and how to generate high quality tags. The search engine returns the webpage title and abstract. It uses text segmentation tool to segment text, and removes stop words; it constructs a suffix tree, with words put into the suffix tree nodes. By several constraint conditions such as word frequency, word length, word and location, it calculates each node score; it combines base clusters and makes node word with high score as the label. The experimental results show this method’s clusters have high purity. The extracted labels are accurate and distinguish strongly. It’s user-friendly.
Key words: search results clustering, suffix tree, cluster label, Chinese search, clustering
摘要: 检索结果聚类能够帮助用户快速定位需要查找的信息。注重进行中文文本聚类的同时生成高质量的标签,获取搜索引擎返回的网页标题和摘要,利用分词工具对文本分词,去除停用词;统一构建一棵后缀树,以词语为单位插入后缀树各节点,通过词频、词长、词性和位置几项约束条件计算各节点词语得分;合并基类取得分高的节点词作标签。实验结果显示该方法的聚类簇纯度较高,提取的标签准确且区分性较强,方便用户使用。
关键词: 检索结果聚类, 后缀树, 聚类标签, 中文检索, 聚类
YUAN Jinsheng, RONG Yuanyuan. Chinese search results cluster research based on improved STC[J]. Computer Engineering and Applications, 2014, 50(21): 143-146.
袁津生,荣元媛. 改进后缀树的中文检索结果聚类研究[J]. 计算机工程与应用, 2014, 50(21): 143-146.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/
http://cea.ceaj.org/EN/Y2014/V50/I21/143