计算机工程与应用 ›› 2009, Vol. 45 ›› Issue (15): 169-172.DOI: 10.3778/j.issn.1002-8331.2009.15.049

• 数据库、信号与信息处理 • 上一篇    下一篇

基于灰色关联分析的缺失值重复填补方法

苏毅娟   

  1. 广西师范学院 计算机与信息工程学院,南宁 530023
  • 收稿日期:2008-03-24 修回日期:2008-06-30 出版日期:2009-05-21 发布日期:2009-05-21
  • 通讯作者: 苏毅娟

Multiple imputation method for missing values by gray relation analysis

SU Yi-juan   

  1. College of Computer and Information,Guangxi Teachers Education University,Nanning 530023,China
  • Received:2008-03-24 Revised:2008-06-30 Online:2009-05-21 Published:2009-05-21
  • Contact: SU Yi-juan

摘要: 缺失填补是机器学习与数据挖掘领域中极富有挑战性的工作。数据源中的缺失值会对学习算法的性能与学习的质量产生较大的负面影响。目前存在的缺失值填补方法还不能满足用户的需要。提出了一种基于灰色系统理论的缺失值填补方法,该方法采用了基于实例学习的非参拟合和灰色理论技术,对缺失数据进行重复填补,直至填补结果收敛或者满足用户的需要。实验结果表明,该方法在填补效果与效率方面都比现有的KNN填补法和普通的均值替代法要好。

Abstract: Imputing missing values is one of the challenges in data mining and machine learning.Missing values in a dataset can decrease the efficiency of learning algorithm and negatively affect the algorithm.Existing imputation methods for missing values can not fully satisfy the users’ increasing requirements.In this paper,a novel nonparametric algorithm is proposed by using the gray system theory.In this algorithm,missing values are imputed iteratively until the algorithm converges or the output matches to the users’ requirement.Experiments with the UCI dataset demonstrate that our method performs better than many existing algorithms such as the KNN algorithm and the mean method in terms of imputation efficiency.