计算机工程与应用 ›› 2013, Vol. 49 ›› Issue (2): 160-164.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

一种基于LDA模型的主题句抽取方法

王  力1,2,李培峰1,2,朱巧明1,2   

  1. 1.苏州大学 计算机科学和技术学院,江苏 苏州 215006
    2.江苏省计算机信息处理技术重点实验室,江苏 苏州 215006
  • 出版日期:2013-01-15 发布日期:2013-01-16

Approach for topical sentence extraction based on model LDA

WANG Li1,2, LI Peifeng1,2, ZHU Qiaoming1,2   

  1. 1.School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215006, China
    2.Jiangsu Provincial Key Lab for Computer Information Processing Technology, Suzhou, Jiangsu 215006, China
  • Online:2013-01-15 Published:2013-01-16

摘要: 在基于Web的主题关键词查询扩展,获取候选主题句的基础上,提出一种基于LDA模型的主题句抽取方法,以抽取粒度较细的主题信息,并增加主题信息的置信度。该方法通过多个侧面对目标主题的衬托,采用LDA模型对主题信息进行建模,利用各个主题概率分布的平滑度进行候选句的可信度计算来抽取主题句。在面向Web的主题句抽取的具体应用中,取得了较好的效果。

关键词: 隐含狄利克雷分配(LDA), 主题模型, 主题句抽取, 信息融合

Abstract: This paper proposes a novel topic sentence extraction approach based on model LDA on basis of  acquiring candidate topic sentences through the topic-related and query-based keyword expansion. It can extract fine granularity on the subject and increase the reliability of the certain topical information. On several sub-topics against a target topic, it extracts those topic sentences by means of the reliability calculation according to the smoothness of the topic-sentence probability distribution. The method achieves good result in the special application of sentence extraction on web topic.

Key words: Latent Dirichlet Allocation(LDA), topic model, topical sentence extraction, information fusion