Computer Engineering and Applications ›› 2007, Vol. 43 ›› Issue (15): 173-175.

• 数据库与信息处理 • Previous Articles     Next Articles

Auto-indexing based on Chinese characters coding on words platform

JIAO Hui,LIU Qian,JIA Hui-bo   

  1. 1.Department of Precision Instruments and Mechanology,Tsinghua University,Beijing 100084,China
    2.State Key Laboratory of Precision Measurement Technology and Instruments,Beijing 100084,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-05-21 Published:2007-05-21
  • Contact: JIAO Hui

基于词平台汉字编码的自动标引研究

焦 慧,刘 迁,贾惠波   

  1. 1.清华大学 精密仪器与机械学系,北京 100084
    2.精密测试技术及仪器国家重点实验室,北京 100084
  • 通讯作者: 焦 慧

Abstract: Auto-indexing is one of the key techniques of information retrieval based on contents.At present the research on Chinese auto-indexing mainly focuses on automatic segmentation which is a predisposal problem.This paper presents a kind of Chinese characters coding method on words platform,and establishes a new Chinese text format in computer which makes words the smallest information unit.Based on this method,auto-indexing does not rely on segmentation as before.Thereby the efficiency and quality of auto-indexing would be improved.

Key words: auto-indexing, words platform, Chinese characters coding, automatic segmentation

摘要: 自动标引是基于内容检索的关键技术之一。目前国内的汉语自动标引研究主要集中于汉语自动分词这个前期处理问题上。提出了一种基于词平台的汉字编码方法,建立了一种新的中文计算机文档表达格式,使词成为最小的信息单位,汉语分析无需再进行自动分词,可直接进行自动标引,从而提高自动标引的效率和质量。

关键词: 自动标引, 词平台, 汉字编码, 自动分词