基于多级阈值的中文人名识别

计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (33): 1-3.

基于多级阈值的中文人名识别

余祖波¹,高庆狮^1,2,马建军¹

1.大连理工大学计算机科学与工程系,辽宁大连 116023
2.北京科技大学智能、语言与计算机科学研究所,北京 100083

收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-11-21 发布日期:2007-11-21
通讯作者: 余祖波

Chinese personal name recognition based on multilevel threshold

YU Zu-bo¹,GAO Qing-shi^1,2,MA Jian-jun¹

1.Department of Computer Science and Engineering,Dalian University of Technology,Dalian,Liaoning 116023,China
2.Institute of Intelligence,Linguistics and Computer Science,University of Science and Technology Beijing,Beijing 100083,China

Received:1900-01-01 Revised:1900-01-01 Online:2007-11-21 Published:2007-11-21
Contact: YU Zu-bo

摘要/Abstract

摘要： 在对大规模姓名样本库统计的基础上,研究了各种中文人名的姓氏、名字用字规律,并通过对大规模语料库的统计分析,得到了每个姓氏用字在真实文本中用作真实姓氏的概率及其上下文规律;针对汉族人名和少数民族人名及音译人名,分别提出了多级姓氏阈值和多级首字阈值的概念,并使用3σ法则确定阈值。实验结果表明,基于多级阈值的中文人名识别模型是有效的。

关键词: 自然语言处理, 未登录词识别, 中文人名识别, 多级阈值, 3σ法则

Abstract: This paper presents the rules of surname words and name words of all kinds of Chinese personal names based on a large scale personal names base.It also shows the probability of all surname words being a surname and their contexts rules by making a statistics on a large scale corpus.In allusion to personal names of Chinese Han Nationality,multilevel threshold of surname is proposed.In order to recognize personal names of Chinese minority nationalities and transliterated personal names,it proposes multilevel threshold of the first word of personal name as well.And these thresholds are chosen by 3σ rule.The results show that the model of multilevel threshold is effective in recognizing Chinese personal names.

Key words: natural language processing, unknown words recognition, Chinese personal name recognition, multilevel threshold, 3σ rule

余祖波¹,高庆狮^1,2,马建军¹. 基于多级阈值的中文人名识别[J]. 计算机工程与应用, 2007, 43(33): 1-3.

YU Zu-bo¹,GAO Qing-shi^1,2,MA Jian-jun¹. Chinese personal name recognition based on multilevel threshold [J]. Computer Engineering and Applications, 2007, 43(33): 1-3.

[1]	刘博闻，范春晓. 基于位置感知能力胶囊网络的实体关系提取[J]. 计算机工程与应用, 2021, 57(6): 101-107.
[2]	廖文雄，曾碧，徐雅芸. 结合一维扩展卷积与Attention机制的NLP模型[J]. 计算机工程与应用, 2021, 57(4): 114-119.
[3]	江洋洋，金伯，张宝昌. 深度学习在自然语言处理领域的研究进展[J]. 计算机工程与应用, 2021, 57(22): 1-14.
[4]	袁勋，刘蓉，刘明. 融合多层注意力的方面级情感分析模型[J]. 计算机工程与应用, 2021, 57(22): 147-152.
[5]	杨泉. N1+N2结构语法关系判定的SVM算法[J]. 计算机工程与应用, 2021, 57(20): 104-108.
[6]	焦凯楠，李欣，朱容辰. 中文领域命名实体识别综述[J]. 计算机工程与应用, 2021, 57(16): 1-15.
[7]	刘畅，阿布都克力木·阿布力孜，姚登峰，哈里旦木·阿布都克里木. 维吾尔语形态分析研究综述[J]. 计算机工程与应用, 2021, 57(15): 42-61.
[8]	李智，王震，杨赋庚，奚雪峰. 基于表格的自动问答研究与展望[J]. 计算机工程与应用, 2021, 57(13): 67-76.
[9]	包玥，李艳玲，林民. 抽取式机器阅读理解研究综述[J]. 计算机工程与应用, 2021, 57(12): 25-36.
[10]	何玉洁，杜方，史英杰，宋丽娟. 基于深度学习的命名实体识别研究综述[J]. 计算机工程与应用, 2021, 57(11): 21-36.
[11]	孙凌浩. 利用翻译模型的跨语言中文命名实体识别[J]. 计算机工程与应用, 2021, 57(10): 94-100.
[12]	郝超，裘杭萍，孙毅，张超然. 多标签文本分类研究进展[J]. 计算机工程与应用, 2021, 57(10): 48-56.
[13]	余同瑞，金冉，韩晓臻，李家辉，郁婷. 自然语言处理预训练模型的研究综述[J]. 计算机工程与应用, 2020, 56(23): 12-22.
[14]	吴呈，王朝坤，王沐贤. 基于文本化简的实体属性抽取方法[J]. 计算机工程与应用, 2020, 56(21): 115-122.
[15]	涂文博，袁贞明，俞凯. 无池化层卷积神经网络的中文分词方法[J]. 计算机工程与应用, 2020, 56(2): 120-126.