基于混合模型的交集型歧义消歧策略

doi:10.3778/j.issn.1002-8331.2008.21.002

计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (21): 5-8.DOI: 10.3778/j.issn.1002-8331.2008.21.002

基于混合模型的交集型歧义消歧策略

李天侠,戴新宇,陈家骏

南京大学计算机软件新技术国家重点实验室，南京 210093
南京大学计算机科学与技术系，南京 210093

收稿日期:2008-04-30 修回日期:2008-06-02 出版日期:2008-07-21 发布日期:2008-07-21
通讯作者: 李天侠

Hybrid model for overlapping ambiguities resolution

LI Tian-xia,DAI Xin-yu,CHEN Jia-jun

National Laboratory of Novel Software Technology，Nanjing University，Nanjing 210093，China
Department of Computer Science and Technology，Nanjing University，Nanjing 210093，China

Received:2008-04-30 Revised:2008-06-02 Online:2008-07-21 Published:2008-07-21
Contact: LI Tian-xia

摘要/Abstract

摘要： 针对交集型歧义这一汉语分词中的难点问题，提出了一种规则和统计相结合的交集型歧义消歧模型。首先，根据标注语料库，通过基于错误驱动的学习思想，获取交集型歧义消歧规则库，同时，利用统计工具，构建N-Gram统计语言模型；然后，采用正向/逆向最大匹配方法和消歧规则库探测发现交集型歧义字段；最后，通过消歧规则库和评分函数进行交集型歧义的消歧处理。这种基于混合模型的方法可以探测到更多的交集型歧义字段，并且结合了规则方法和统计方法在处理交集型歧义上的优势。实验表明，这种方法提高了交集型歧义处理的精度，为解决交集型歧义提供了一种新的思路。

关键词: 交集型歧义, 消歧规则, 统计语言模型, 评分函数, 全切分

Abstract: Overlapping ambiguity is one of the key problems in Chinese words segmentation.In this paper，a new hybrid strategy which integrates rule-based method and statistical-based method is presented for solving the overlapping ambiguity.Firstly，rule-set is constructed automatically through error-driven learning which will be used for some ambiguities detection and resolution.Secondly，a score function based on N-Gram language model is constructed.Lastly，a rule-based module and a statistical-based module will be combined for solving all ambiguities detected by FMM&BMM and the rule-set.The experiments show that this hybrid method is more suitable for ambiguities detection and possesses the advantages of both rule-based and statistical-based methods for overlapping ambiguities resolution in Chinese words segmentation.

Key words: overlapping ambiguity, disambiguation rules, statistical language model, score function, full segmentation

李天侠,戴新宇,陈家骏. 基于混合模型的交集型歧义消歧策略[J]. 计算机工程与应用, 2008, 44(21): 5-8.

LI Tian-xia,DAI Xin-yu,CHEN Jia-jun. Hybrid model for overlapping ambiguities resolution[J]. Computer Engineering and Applications, 2008, 44(21): 5-8.

[1]	石晨，张宇，胡博. 基于共同语境的近义词/同义词短语查找模型[J]. 计算机工程与应用, 2021, 57(14): 142-147.
[2]	蔡青松，陈希厚. 基于评分函数的贝叶斯网络结构融合算法[J]. 计算机工程与应用, 2019, 55(11): 147-152.
[3]	方刚1，张社民2. 三元统计语言模型对基因表达载体设计的优化[J]. 计算机工程与应用, 2016, 52(15): 60-64.
[4]	李国和1，2，3，刘光胜1，2，3，秦波波1，2，3，吴卫江1，2，3，李洪奇1，2，3. 综合最大匹配和歧义检测的中文分词粗分方法[J]. 计算机工程与应用, 2012, 48(14): 139-142.
[5]	张劲松，袁健. 回溯正向匹配中文分词算法[J]. 计算机工程与应用, 2009, 45(22): 132-134.
[6]	达吾勒·阿布都哈依尔,古丽拉·阿东别克 . 哈萨克语词法分析器的研究与实现[J]. 计算机工程与应用, 2008, 44(19): 146-149.

基于混合模型的交集型歧义消歧策略

Hybrid model for overlapping ambiguities resolution

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 6

编辑推荐

Metrics