Computer Engineering and Applications ›› 2010, Vol. 46 ›› Issue (4): 139-141.DOI: 10.3778/j.issn.1002-8331.2010.04.045
• 数据库、信号与信息处理 • Previous Articles Next Articles
XU Run-hua,CHEN Xiao-he,LI Bin
Received:
Revised:
Online:
Published:
Contact:
徐润华,陈小荷,李 斌
通讯作者:
Abstract: Among all kinds of Chinese four-character idioms,the Parallel Four-Character Idiom(PFCI) is special and numerous.This paper introduces the research based on Conditional Random Fields(CRF) model which can retrieve PFCI from a POS-tagged corpus.The paper then analyzes the structural characteristics of PFCI and proposes an approach on recognizing PFCI in word-segmented corpora.By comparing its application on different corpora,the evaluation results show that this recognition approach maintains relatively high precision and good adaptability.
摘要: 并列式四字格是一种特殊却数量众多的四字格。介绍了在有词性标注语料库中基于条件随机场模型的四字格抽取工作,并在此基础上分析了并列式四字格的结构特点,提出了一种基于分词语料库环境的并列式四字格识别方法。通过不同语料库间的对比实验,结果表明该识别方法具有比较好的精确度和一定的适应性。
CLC Number:
TP391.1
XU Run-hua,CHEN Xiao-he,LI Bin. Recognition of parallel four-character idioms in word-segmented corpora[J]. Computer Engineering and Applications, 2010, 46(4): 139-141.
徐润华,陈小荷,李 斌. 分词语料库中的并列式四字格识别[J]. 计算机工程与应用, 2010, 46(4): 139-141.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2010.04.045
http://cea.ceaj.org/EN/Y2010/V46/I4/139