计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (20): 159-165.DOI: 10.3778/j.issn.1002-8331.1706-0250

• 模式识别与人工智能 • 上一篇    下一篇

融合关联规则的学术论文主题学习及表示方法

赵慧茹,林  民   

  1. 内蒙古师范大学 计算机与信息工程学院,呼和浩特 010022
  • 出版日期:2018-10-15 发布日期:2018-10-19

Topic learning and representation method of academic papers with association rules

ZHAO Huiru,LIN Min   

  1. College of Computer and Information Engineering, Inner Mongolia Normal University, Hohhot 010022, China
  • Online:2018-10-15 Published:2018-10-19

摘要: 针对现有主题模型学习结果语义可解释性差、准确性不高等问题,提出了一种融合关联规则和学术论文元数据的主题学习及表示方法。将学术论文预处理得到目录元数据;利用目录元数据作为先验知识指导主题学习,得到文档中关于主题的词项概率分布;通过加权关联规则挖掘得到各主题的频繁三项集,提出判断主题质量优劣的标准;利用学术论文的元数据,通过改进的向量空间模型算法,合并语义相似的主题;最终得到更符合实际情况且语义可解释性更优的主题语义表示结果。在同一学术论文数据集上,采用三种主题学习及表示方法进行对比实验。实验结果表明,该方法在主题抽取准确度、主题粒度等方面均优于其他方法,充分验证了所提方法的有效性。

关键词: 主题模型, 加权关联规则挖掘, 学术论文, 频繁三项集

Abstract: Aiming at the problem that the semantic explanation of the existing topic model is poor and the accuracy is not high, a semi-supervised topic learning and representation method based on association rules and metadata is proposed. Firstly, it uses the metadata of academic papers as a priori knowledge to guide the topic learning, and gets the probability distribution of the term in the document. Then, it gets the frequent three items of each topic by weighted association rule. And then it uses the metadata of academic papers to improve the semantic similarity through the improved vector space model algorithm. Finally, it gets the topic semantics which are more in line with the actual situation and have better semantic explanation. On the same data set of academic papers, three topic learning and representation methods are used to compare experiments. The experimental results show that the method proposed is superior to the others in terms of topic extraction accuracy and topic granularity, and fully validates the effectiveness of the proposed method.

Key words: topic model, weighted association rule mining, academic papers , frequent three-item sets