计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (19): 153-155.

• 数据库、信号与信息处理 • 上一篇    下一篇

搜索引擎用户查询中的复杂专有名词识别

胡学营,刘 慧,陆汝占   

  1. 上海交通大学 计算机科学与工程系,上海 200240
  • 收稿日期:2007-09-26 修回日期:2007-12-17 出版日期:2008-07-01 发布日期:2008-07-01
  • 通讯作者: 胡学营

Recognition of complex named-entities in user queries of search engine

HU Xue-ying,LIU Hui,LU Ru-zhan   

  1. Department of Computer Science and Engineering,Shanghai Jiaotong University,Shanghai 200240,China
  • Received:2007-09-26 Revised:2007-12-17 Online:2008-07-01 Published:2008-07-01
  • Contact: HU Xue-ying

摘要: 专有名词识别(Named-Entity Recognition,NER)是自然语言处理和信息检索的基础。现有的很多文献集中于人名、地名、机构名等的识别,很少涉及到书名和电影名等较为复杂的专有名词。专注于某搜索引擎的用户查询日志中出现的这类复杂专有名词的识别。根据用户的查询在网络中的上下文数据,将查询进行粗切分,并利用该网络数据作为训练语料训练复杂专名分类器。使用三种不同的分类器,证实该方法能取得相当好的效果。

Abstract: Named-Entity Recognition(NER) is a fundamental task for natural language processing and information retrieval.Literatures are full of person,location and organization names,while complex named-entities as book names and movies names are seldom referred.The authors focus on the recognition of such complex named-entities in query logs of a search engine.The authors roughly segment the queries according to their Web context and use the Web data to train a complex named-entities classifier.The authors use three different classifiers,which show that the methods have fairly good performance.