Analysis of inaccurate style in processing Web true news text——about word segmentation and part of speech tagging

Computer Engineering and Applications ›› 2007, Vol. 43 ›› Issue (15): 166-169.

• 数据库与信息处理 • Previous Articles Next Articles

Analysis of inaccurate style in processing Web true news text——about word segmentation and part of speech tagging

ZHANG Yong-kui^1，2，ZHANG Yan^1，2，AN Zeng-bo³，LIU Rui^1，2

1.Department of Computer & Information Technology，Shanxi University，Taiyuan 030006，China
2.Key Laboratory of Ministry of Education for Computation Intelligence and Chinese Information Processing，Taiyuan 030006，China
3.Workstation Automation of 91708 PLA，Guangzhou 510320，China

Received:1900-01-01 Revised:1900-01-01 Online:2007-05-21 Published:2007-05-21
Contact: ZHANG Yong-kui

Web新闻语料分词和标注错误分析

张永奎^1，2，张彦^1，2，安增波³，刘睿^1，2

1.山西大学计算机与信息技术学院，太原 030006
2.计算智能与中文信息处理省部共建教育部重点实验室，太原 030006
3.中国人民解放军91708部队自动化工作站，广州 510320

通讯作者: 张永奎

Abstract

Abstract: Eleven inaccurate styles are obtained through analyzing the processing of Web accidental news text，we propose resolvent for some styles.This not only illuminates the improvement of word segmentation and part of speech tagging methods in early process of corpora，but also provides references to automatic check，another branch of Chinese information processing.

Key words: Chinese information processing, word segmentation, part of speech tagging, inaccurate style, Web accidental news corpora

摘要： 通过分析Web突发事件语料库文本的加工统计得出11类错误类型，并对其中的一些错误提出了解决方案。研究结果不仅对语料库加工初期分词、标注方法的改进有启发作用，而且对中文的自动校对方法，提供一定的借鉴。

关键词: 中文信息处理, 分词, 词性标注, 错误类型, Web突发事件新闻语料库

ZHANG Yong-kui^1，2，ZHANG Yan^1，2，AN Zeng-bo³，LIU Rui^1，2. Analysis of inaccurate style in processing Web true news text——about word segmentation and part of speech tagging[J]. Computer Engineering and Applications, 2007, 43(15): 166-169.

张永奎^1，2，张彦^1，2，安增波³，刘睿^1，2. Web新闻语料分词和标注错误分析[J]. 计算机工程与应用, 2007, 43(15): 166-169.

[1]	TU Wenbo, YUAN Zhenming, YU Kai. Convolutional Neural Networks Without Pooling Layer for Chinese Word Segmentation [J]. Computer Engineering and Applications, 2020, 56(2): 120-126.
[2]	XU Xuebin, Hornisa Mamat, Alim Aysa, ZHU Yali, Kurban Ubul. Word Segmentation of Uyghur Image Based on Clustering and Conjoined Segment Identification [J]. Computer Engineering and Applications, 2020, 56(14): 148-155.
[3]	LIU Chenhui, ZHANG Desheng, HU Gang. Research on Chinese Key Phrase Extraction Algorithm Based on TAKE [J]. Computer Engineering and Applications, 2020, 56(10): 115-121.
[4]	SUN Baoshan, LI Wei. Recurrent Neural Network for Chinese Word Segmentation with Peephole-Connections [J]. Computer Engineering and Applications, 2019, 55(19): 160-165.
[5]	CHENG Yusi1, SHI Yuntao2. Domain specific Chinese word segmentation [J]. Computer Engineering and Applications, 2018, 54(17): 30-34.
[6]	ZHAO Weifeng1，2, ZHANG Qin1. Automatic identification of address description in unstructured Chinese natural language [J]. Computer Engineering and Applications, 2016, 52(23): 19-24.
[7]	ZHU Yanhui, LIU Jing, XU Yeqiang, TIAN Hailong, MA Jin. Chinese word segmentation research based on Conditional Random Field [J]. Computer Engineering and Applications, 2016, 52(15): 97-100.
[8]	HU Jinzhu1, SHU Jiangbo2, HU Quan3, LI Yuan1, YANG Jincai1, XIE Fang4. Research on expression method of rules in auto-identifying relational word of Chinese compound sentences [J]. Computer Engineering and Applications, 2016, 52(1): 127-132.
[9]	JIANG Fang1，2, LI Guohe1，2，3, YUE Xiang4, WU Weijiang1，2，3, HONG Yunfeng3, LIU Zhiyuan3, CHENG Yuan3. Segmentation of Chinese word based on method of rough segment and part of speech tagging [J]. Computer Engineering and Applications, 2015, 51(6): 204-207.
[10]	ZHANG Sifa, MA Yongge. Design and implementation of vertical search engine for field of geosciences [J]. Computer Engineering and Applications, 2012, 48(33): 85-88.
[11]	YE Jiping, ZHANG Guizhu. Research and improvement of Chinese word segmentation dictionary [J]. Computer Engineering and Applications, 2012, 48(23): 139-142.
[12]	ZHAO Youqiao1, ZHANG Shanshan1, LU Songfeng1, WU Zhijie2. COX：Chinese-oriented XML compressor with high compression ratio [J]. Computer Engineering and Applications, 2012, 48(17): 143-147.
[13]	LI Guohe1，2，3, LIU Guangsheng1，2，3, QIN Bobo1，2，3, WU Weijiang1，2，3, LI Hongqi1，2，3. Method of Chinese word rough segmentation by maximum match and ambiguity detection algorithms [J]. Computer Engineering and Applications, 2012, 48(14): 139-142.
[14]	FAN Xinghua, WANG Peng, ZHOU Peng. Two-step text orientation identification based on feature extension [J]. Computer Engineering and Applications, 2012, 48(1): 162-165.
[15]	YU Jiangde¹，WANG Xijie¹，FAN Xiaozhong². Comparing of importance of above-context versus below-context for Chinese word segmentation [J]. Computer Engineering and Applications, 2011, 47(4): 117-120.

Analysis of inaccurate style in processing Web true news text——about word segmentation and part of speech tagging

Web新闻语料分词和标注错误分析

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics