Research of Technology on Building China English New Words Corpus

doi:10.3778/j.issn.1002-8331.1906-0128

Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (16): 165-168.DOI: 10.3778/j.issn.1002-8331.1906-0128

Previous Articles Next Articles

Research of Technology on Building China English New Words Corpus

LIU Yongfang, HAO Xiaoyan, LIU Rong

1.College of Information and Computer, Taiyuan University of Technology, Taiyuan 030000, China
2.Foreign Language College, Taiyuan University of Technology, Taiyuan 030000, China

Online:2020-08-15 Published:2020-08-11

中国英语新词语料库构建技术研究

刘永芳，郝晓燕，刘荣

1.太原理工大学信息与计算机学院，太原 030000
2.太原理工大学外国语学院，太原 030000

Abstract

Abstract:

Specialized corpus about new words is too rare to systematically study the growing amount of China English new words, and new words identification is the main technical problem in constructing a corpus. Aiming at the problem that existing new words recognition algorithms based on Pointwise Mutual Information（PMI） and Branch Entropy（BE） have a low inner cohesion degree of new words, and invalid phrases with high threshold and unrecognizable new phrases with low threshold in setting single threshold of mutual information, a recognition algorithm of China English new words based on improved multi-word PMI and BE is proposed. The new words are identified through multi-word PMI and double threshold of PMI. Experimental results show that the proposed method improves the accuracy rate, recall rate and the [F] value, and is effective and feasible for corpus construction.

Key words: China English, corpus of China English new words, identification of new words, Pointwise Mutual Information（PMI）, double threshold

摘要：

随着中国英语新词大量出现，缺少中国英语新词语料库成为研究中国英语的主要障碍，新词识别是建设语料库主要的技术问题。针对现有的点互信息和邻接熵新词识别算法中的词内部凝聚度低，及点互信息单阈值设置存在较多高阈值无效词组，且低阈值新词组无法识别的问题，提出了改进多字点互信息和邻接熵中国英语新词识别算法。利用多字点互信息以及点互信息双阈值的设定来识别新词。实验结果表明，相同数据和实验环境下，该方法提高了准确率、召回率和[F]值，对语料库建设是有效可行的。

关键词: 中国英语, 中国英语新词语料库, 新词识别, 点互信息（PMI）, 双阈值

LIU Yongfang, HAO Xiaoyan, LIU Rong. Research of Technology on Building China English New Words Corpus[J]. Computer Engineering and Applications, 2020, 56(16): 165-168.

刘永芳，郝晓燕，刘荣. 中国英语新词语料库构建技术研究[J]. 计算机工程与应用, 2020, 56(16): 165-168.

[1]	CUI Liqun, ZHANG Yue, TIAN Xin. Fusion of double threshold and improved morphological edge detection [J]. Computer Engineering and Applications, 2017, 53(9): 190-194.
[2]	FANG Wei, LIU Shunlan. Algorithm on spectrum sensing with double threshold and opportunistic cooperation [J]. Computer Engineering and Applications, 2015, 51(8): 113-116.
[3]	WANG Yongfeng, LI Ou. Hierarchical fusion cooperative spectrum sensing based on double thresholds with multi-antenna cognitive radios [J]. Computer Engineering and Applications, 2015, 51(17): 76-81.
[4]	DENG Chaosheng, CHEN Ying. Double threshold matching algorithm based on local SIFT feature points [J]. Computer Engineering and Applications, 2014, 50(2): 189-193.
[5]	LV Shoutao1, LIU Jian2, CHEN Hongyu1. Cooperative spectrum sensing algorithm based on double threshold and D-S theory [J]. Computer Engineering and Applications, 2014, 50(12): 211-215.
[6]	HU Xiaoning, WU Guofeng, HU Hanying. Cooperative spectrum sensing with double threshold under noise uncertainty [J]. Computer Engineering and Applications, 2012, 48(8): 158-160.
[7]	LIU Qian1, QIU Bin2. Flowers image segmentation based on clonal selection algorithm [J]. Computer Engineering and Applications, 2012, 48(14): 185-189.
[8]	GUO Xingming¹，LIN Huijie1，XIAO Shouzhong^1，2. Medical parameters extraction of heart sounds [J]. Computer Engineering and Applications, 2011, 47(3): 214-217.
[9]	JIA Chao，WANG Yaokun，XING Jingjing. Edge detection of crack defect based on wavelet multi-scale multiplication [J]. Computer Engineering and Applications, 2011, 47(15): 219-221.
[10]	ZHANG Lei^1，2，DUAN Li-li¹，HUANG Guang-ming¹. Spectrum sensing technology based on reputation in cognitive radio network [J]. Computer Engineering and Applications, 2010, 46(27): 103-105.
[11]	MA Rui，ZHANG Sheng-bing，ZHENG Qiao-shi. Design of voice active detection circuit [J]. Computer Engineering and Applications, 2010, 46(14): 69-71.
[12]	FU Hao，ZHANG Hui，AN Xiang-jing. Computer vision system for Robocon2007 [J]. Computer Engineering and Applications, 2009, 45(29): 84-86.

Research of Technology on Building China English New Words Corpus

中国英语新词语料库构建技术研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 12

Recommended Articles

Metrics