计算机工程与应用 ›› 2014, Vol. 50 ›› Issue (6): 118-122.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

HSK自动作文评分的特征选取研究

黄志娥,谢佳莉,荀恩东   

  1. 北京语言大学 汉语国际教育技术研发中心,北京 100083
  • 出版日期:2014-03-15 发布日期:2015-05-12

Study of feature selection in HSK automated essay scoring

HUANG Zhi’e, XIE Jiali, XUN Endong   

  1. International R&D Center for Chinese Education, Beijing Language and Culture University, Beijing 100083, China
  • Online:2014-03-15 Published:2015-05-12

摘要: 作文特征选取是研究汉语作为第二语言的水平测试自动作文评分的关键问题之一,以中国汉语水平考试作文为研究对象,从字、词、语法、成段表达、庄雅度等多个层面上,选取107个作文特征,经相关度计算得到19个与作文分数较为相关的作文特征。基于选取的作文特征,采用多元线性回归方法进行回归实验和稳定性交叉实验。实验表明,作文长度、词汇使用和成段表达方面的作文特征对作文得分具有较好的解释能力,多元线性回归方法应用于中国汉语水平考试自动作文评分具有较好的稳定性。

关键词: 中国汉语水平考试, 自动作文评分, 特征选取, 多元线性回归

Abstract: Feature selection is a key issue in automated essay scoring for Chinese as second language. Focusing on HSK composition test, 107 features are extracted, mainly describing Chinese character using, word using, grammatical mistakes, paragraph expression, formality measuring, etc. 19 of them are proved to have strong correlation with composition scoring, through relativity calculation. Based on the selected features, multiple linear regression and stability cross experiment are utilized. Essay length, word use and paragraph expression are found to be explanatory capable and multiple linear regression provides better stability in HSK composition test.

Key words: HSK, Automated Essay Scoring(AES), feature selection, multiple linear regression