无池化层卷积神经网络的中文分词方法

doi:10.3778/j.issn.1002-8331.1809-0177

计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (2): 120-126.DOI: 10.3778/j.issn.1002-8331.1809-0177

无池化层卷积神经网络的中文分词方法

涂文博，袁贞明，俞凯

1.杭州师范大学信息工程学院，杭州 311121
2.移动健康管理系统教育部工程研究中心，杭州 311121

出版日期:2020-01-15 发布日期:2020-01-14

Convolutional Neural Networks Without Pooling Layer for Chinese Word Segmentation

TU Wenbo, YUAN Zhenming, YU Kai

1.College of Information Engineering, Hangzhou Normal University, Hangzhou 311121, China
2.Engineering Research Center of Mobile Health Management System, Ministry of Education, Hangzhou 311121, China

Online:2020-01-15 Published:2020-01-14

摘要/Abstract

摘要： 在中文信息处理中，分词是一个十分常见且关键的任务。很多中文自然语言处理的任务都需要先进行分词，再根据分割后的单词完成后续任务。近来，越来越多的中文分词采用机器学习和深度学习方法。然而，大多数模型都不同程度的有模型过于复杂、过于依赖人工处理特征、对未登录词表现欠佳等缺陷。提出一种基于卷积神经网络（Convolutional Neural Networks，CNN）的中文分词模型——PCNN（Pure CNN）模型，该模型使用基于字向量上下文窗口的方式对字进行标签分类，具有结构简单、不依赖人工处理、稳定性好、准确率高等优点。考虑到分布式字向量本身的特性，在PCNN模型中不需要卷积的池化（Pooling）操作，卷积层提取的数据特征得到保留，模型训练速度获得较大提升。实验结果表明，在公开的数据集上，模型的准确率达到当前主流神经网络模型的表现水准，同时在对比实验中也验证了无池化层（Pooling Layer）的网络模型要优于有池化层的网络模型。

关键词: 自然语言处理, 中文分词, 卷积神经网络, 字向量

Abstract: In Chinese information processing, word segmentation is a very common and critical task. Usually, the first step of the Chinese Natural Language Processing（NLP） tasks is word segmentation. Over the years, the method of Chinese word segmentation has evolved from machine learning to deep learning. However, most of the models have various deficiencies such as the models being too complex, relying heavily on hand-crafted features, and having poor performance on Out of Vocabulary（OOV） words. This paper proposes a PCNN （Pure CNN） Chinese word segmentation model based on Convolutional Neural Networks（CNN）. This model uses the word vector context window to label the words. It has a simple structure and does not rely on the hand-crafted features, good stability, high accuracy and other advantages. Considering the characteristics of the distributed word vector itself, there is no need for pooling in the PCNN model. The features data extracted from the convolution layer are preserved, and the training speed of the model is greatly improved. The experimental results on public datasets show that the accuracy of the model is reached other neural network models. At the same time, it is also verified in the comparison experiment that the network model without pooling layer is superior to the network model with pooling layer.

Key words: Natural Language Processing（NLP）, Chinese word segmentation, Convolutional Neural Networks（CNN）, word vector

涂文博，袁贞明，俞凯. 无池化层卷积神经网络的中文分词方法[J]. 计算机工程与应用, 2020, 56(2): 120-126.

TU Wenbo, YUAN Zhenming, YU Kai. Convolutional Neural Networks Without Pooling Layer for Chinese Word Segmentation[J]. Computer Engineering and Applications, 2020, 56(2): 120-126.

[1]	牟清萍，张莹，张东波，王新杰，杨知桥. 目标丢失判别机制的视觉跟踪算法及应用研究[J]. 计算机工程与应用, 2021, 57(9): 140-147.
[2]	包志强，邢瑜，吕少卿，黄琼丹. 改进YOLO V2的6D目标姿态估计算法[J]. 计算机工程与应用, 2021, 57(9): 148-153.
[3]	赵志焱，杨华，胡志伟，宇海萍. 基于TACNN的玉露香梨叶虫害识别[J]. 计算机工程与应用, 2021, 57(9): 176-181.
[4]	周伦钢，孙怡峰，王坤，吴疆，黄维贵，李炳龙. 目标多种多值属性的端端快速识别网络[J]. 计算机工程与应用, 2021, 57(9): 182-190.
[5]	张成，戴俊峰，熊闻心. 融合LeNet-5改进的扫描文档手写日期识别[J]. 计算机工程与应用, 2021, 57(9): 207-211.
[6]	麻哲旭，杨峰，乔旭. 铁路路基病害智能检测方法[J]. 计算机工程与应用, 2021, 57(9): 272-278.
[7]	冉蓉，徐兴华，邱少华，崔小鹏，欧阳斌. 基于深度卷积神经网络的裂纹检测方法综述[J]. 计算机工程与应用, 2021, 57(9): 23-35.
[8]	张越，黄友锐，刘鹏坤. 引入注意力机制的多分辨率人体姿态估计研究[J]. 计算机工程与应用, 2021, 57(8): 126-132.
[9]	李现国，冯欣欣，李建雄. 多尺度残差网络的单幅图像超分辨率重建[J]. 计算机工程与应用, 2021, 57(7): 215-221.
[10]	梁芳烜，杨锋，卢丽云，尹梦晓. 基于卷积神经网络的脑肿瘤分割方法综述[J]. 计算机工程与应用, 2021, 57(7): 34-43.
[11]	杨培伟，周余红，邢岗，田智强，许夏瑜. 卷积神经网络在生物医学图像上的应用进展[J]. 计算机工程与应用, 2021, 57(7): 44-58.
[12]	常昊，陈晓雷，张爱华，李策，林冬梅. 嵌入改进SENet的卷积神经网络连续血压预测[J]. 计算机工程与应用, 2021, 57(7): 130-135.
[13]	王翀，韩振奇，徐浩煜，祝永新，徐胜，陈夏. 基于改进显著图的高效裂纹检测算法[J]. 计算机工程与应用, 2021, 57(6): 219-224.
[14]	黄金杰，蔺江全，何勇军，何瑾洁，王雅君. 局部语义与上下文关系的中文短文本分类算法[J]. 计算机工程与应用, 2021, 57(6): 94-100.
[15]	刘博闻，范春晓. 基于位置感知能力胶囊网络的实体关系提取[J]. 计算机工程与应用, 2021, 57(6): 101-107.

无池化层卷积神经网络的中文分词方法

Convolutional Neural Networks Without Pooling Layer for Chinese Word Segmentation

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics