计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (32): 19-22.

• 博士论坛 • 上一篇    下一篇

突发事件热点话题识别系统及关键问题研究

陈莉萍,杜军平   

  1. 北京邮电大学 计算机学院,北京 100876
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-11-11 发布日期:2011-11-11

Study on hot topics identification and key issues about emergency events

CHEN Liping,DU Junping   

  1. School of Computer Science,Beijing University of Posts and Telecommunications,Beijing 100876,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-11-11 Published:2011-11-11

摘要: 针对突发事件热点话题识别系统,建立了系统实现的整体技术框架,给出了系统四个组成部分的关键问题描述及解决策略,结合新闻报道文本内容和结构的特点和报道源分布性特征,基于VSM文本表示模型和TF-IDF公式,提出了正文裁剪方法和特征权重计算的改进模型,并以地震突发事件新闻报道作为数据源进行模型评估。实验结果表明通过对新闻报道正文的裁剪,只提取标题、导语及相关特征参量等信息即可作为热点话题识别的样本集,且改进的特征权重计算模型与经典模型比较,具有更好地执行效率和适应性更强的文本表示能力。

关键词: 突发事件, 新闻报道, 热点话题识别, 正文裁剪, 文本表示模型

Abstract: Concerning the system of hot topics detection about the emergency events,an overall technical framework is established to implement the system.Description and solution strategy about the key issues in the four components of the system are provided.In terms of the content and structure features of the news reports as well as the distribution feature of the report sources,the text clipping method and the modified model of feature weighting calculation are proposed based on the VSM text representation model and the TF-IDF formula.The news reports about the earthquake emergency event are evaluated for this model as the data sources.Experimental results indicate that the information such as the headline,the lead and relevant feature parameters by clipping the main body of the news report can be considered as the sample set of the hot topics to be identified.Furthermore,compared with the classical model,the modified feature items weighting calculation model is more efficient in execution and more adaptive in terms of the text representation capability.

Key words: emergency event, news report, hot topic identification, text clipping, text representation model