计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (10): 160-163.

• 数据库、信号与信息处理 • 上一篇    下一篇

基于无监督学习的产品特征抽取

熊  壮   

  1. 重庆大学 计算机学院,重庆 400044
  • 出版日期:2012-04-01 发布日期:2012-04-11

Product feature extraction based on unsupervised learning

XIONG Zhuang   

  1. College of Computer, Chongqing University, Chongqing 400044, China
  • Online:2012-04-01 Published:2012-04-11

摘要: 产品特征抽取是文本观点抽取和倾向性分析中的重要研究课题之一,提出了一种基于无监督学习的产品特征自动抽取方法。该方法从产品评论语句中抽取文本模式,以文本模式作为特征,将产品评论中所有的名词和名词短语(除产品名称)表示为向量,采用聚类算法将表示为向量的名词和名词短语聚为两类,以产品名称作为外部知识,利用表示“整体-部件”关系的文本模式识别产品特征集合。实验结果表明,该方法在电子产品领域的产品评论语料上取得了较好的实验效果。

关键词: 产品评论, 文本模式, &ldquo, 整体-部分&rdquo, 关系

Abstract: The extraction of product feature is one of the important topics in text opinion extraction and sentiment analysis. This paper proposes a method based on unsupervised learning to extract product features. Text patterns are extracted from product review sentences; all the nouns and noun phrases(except product names) in product reviews are expressed as vectors by the feature set constructed by text patterns. All the nouns and noun phrases expressed as vectors are grouped into two sets. The product feature set is identified from the two sets by part-of relation text patterns with the help of product names. The experimental results indicate that, the method achieves good result in the corpus of electronic product reviews.

Key words: product review, text pattern, part-of relation