计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (31): 156-158.DOI: 10.3778/j.issn.1002-8331.2008.31.045

• 数据库、信号与信息处理 • 上一篇    下一篇

数据挖掘中基于核的多重填补的一种新算法

苏毅娟   

  1. 广西师范学院 数学与计算机系,南宁 530023
  • 收稿日期:2007-12-03 修回日期:2008-02-03 出版日期:2008-11-01 发布日期:2008-11-01
  • 通讯作者: 苏毅娟

New kernel-based multiple imputation algorithm for data mining

SU Yi-juan   

  1. Department of Mathematic and Computer Science,Guangxi Teachers Education University,Nanning 530023,China
  • Received:2007-12-03 Revised:2008-02-03 Online:2008-11-01 Published:2008-11-01
  • Contact: SU Yi-juan

摘要: 在数据挖掘预处理中,数据缺失是最为常见的数据预处理问题之一。通常对所要挖掘的数据分布形式没有任何先验知识。在这种情况下,非参回归分析方法可以为数据缺失的处理提供一种效果很好的解决途径。据此,在缺失机制是随机缺失(Missing at Random,MAR)和完全随机缺失(Missing Completely at Random,MCAR)的条件下,提出了一种处理数据缺失的新方法,即基于核函数的非参多重填补算法。模拟实验结果表明,算法的置信区间的覆盖率,区间长度,以及相对效率都比常用的NORM算法要好。

关键词: 多重填补, 缺失数据, 核函数, 非参

Abstract: In the preprocessing of data mining,data missing is one of the most common problems in data preprocessing.Quite frequently,the author have little priori knowledge about distribution of the data we want to mine.Under this condition,non-parametric regression provides an effective approach to handle the data missing.Accordingly,a new kernel-based non-parametric Multiple Imputation(MI) algorithm is proposed,under two missing mechanisms,MAR(Missing At Random) and MCAR(Missing Completely At Random).Experiments over simulation data show that our algorithm performs much better than the traditional NORM method,in the coverage of confidence interval,the interval length,and the relative efficiency.

Key words: Multiple Imputation(MI), missing values, kernel function, non-parametric