计算机工程与应用 ›› 2009, Vol. 45 ›› Issue (6): 145-148.DOI: 10.3778/j.issn.1002-8331.2009.06.041

• 数据库、信号与信息处理 • 上一篇    下一篇

文本聚类在大学英语作文自动评分中应用

葛诗利1,陈潇潇2   

  1. 1.华南理工大学 外国语学院,广州 510640
    2.广东金融学院 外语系,广州 510520
  • 收稿日期:2008-08-27 修回日期:2008-09-24 出版日期:2009-02-21 发布日期:2009-02-21
  • 通讯作者: 葛诗利

Cluster analysis of college English writing in automated essay scoring

GE Shi-li1,CHEN Xiao-xiao2   

  1. 1.School of Foreign Languages,South China University of Technology,Guangzhou 510640,China
    2.Department of Foreign Languages,Guangdong University of Finance,Guangzhou 510520,China
  • Received:2008-08-27 Revised:2008-09-24 Online:2009-02-21 Published:2009-02-21
  • Contact: GE Shi-li

摘要: 面向大学英语写作教学的自动作文评分要求评分方法具有针对非特定作文题目的通用性。在作文内容评价方面,文本聚类能够把作文按内容的相似程度聚集到一起,从而形成一棵内密外疏的聚类树。位于聚类树外围的少数与其它作文内容差异较大,即可能跑题的作文可以反馈给教师进行人工判断,从而花费较少的人力即可做出较准确的作文内容评价。实验表明,通过设置合理的相似度阈值,该方法能够有效识别跑题作文。

Abstract: The automated essay scoring for the teaching of college English writing requires that the scoring method should have the feature of generality,namely,without pertinency of specific subjects.In the aspect of content evaluation,document clustering can put eassys together according to the similarity of their contents to form a clustering tree which has a higher similarity in the core than in the peripheral area of the tree.A few essays that locate in the peripheral area are quite different from most others in content.These essays are possibly off the topic and will be submitted to teachers for further examination. By this way,eassy contents can be evaluated accurately with only minor labor expense.Experiment shows that this method can identify essays off the topic effectively with a reasonable threshold value of content similarity.