计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (28): 168-173.

• 数据库、信号与信息处理 • 上一篇    下一篇

运用改进的词语领域通用度提取哈语通用词汇

王雅莉,古丽拉·阿东别克   

  1. 新疆大学 信息科学与工程学院,乌鲁木齐 830046
  • 出版日期:2012-10-01 发布日期:2012-09-29

Use improved words filed general usage extracting Kazakh common-used words

WANG Yali, Gulila·Altenbek   

  1. College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
  • Online:2012-10-01 Published:2012-09-29

摘要: 以哈萨克语通用词汇自动提取为目标,在传统的词语领域使用度的基础上运用改进的词语领域通用度公式进行哈语词汇通用度的计算,使改进的公式对哈语通用词汇的排序位置有更大的影响。基于通用词汇的三大特征:领域通用性、地域通用性、时间通用性,采用统计的方法考察哈语词汇的通用程度,在哈语词频统计的基础上实现了哈语词汇的通用度统计。实验结果表明改进的词语领域通用度计算公式在提取哈语通用词汇时对词语排序位置的影响力度比传统的词语领域使用度计算公式更大。

关键词: 通用词汇, 哈萨克语, 词汇通用度, 领域通用度, 时间通用度

Abstract: With automatic extraction of Kazakh common-used words for the goal, use the calculation formula of improved words filed general usage calculating lexical general usage of Kazakh common-used words on the basis of traditional words filed usage, enable improved method have greater influence in ranking position of Kazakh common-used words. Based on the three properties of common-used words: filed generality, regional generality, time generality; use statistical methods to investigate the general usage of Kazakh words. On the basis of frequency statistics of Kazakh words, implement the statistics of Kazakh lexical general usage. Experimental results show that the improved calculation formula has greater influence strength of words ranking position than the traditional in extracting Kazakh common-used words.

Key words: common-used words, Kazakh, lexical general usage, filed general usage, time general usage