计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (6): 74-79.

• 大数据与云计算 • 上一篇    下一篇

基于属性相关度的缺失数据填补算法研究

毛玫静1,鄂  旭1,2,谭  艳1,杨明婧1   

  1. 1.渤海大学 信息科学与技术学院,辽宁 锦州 121000
    2.渤海大学 食品科学研究院,辽宁 锦州 121000
  • 出版日期:2016-03-15 发布日期:2016-03-17

Algorithm study on missing data imputation based on attribute relevancy

MAO Meijing1, E Xu1,2, TAN Yan1, YANG Mingjing1   

  1. 1.School of Information Science and Technology, Bohai University, Jinzhou, Liaoning 121000, China
    2.Food Science Research Institution, Bohai University, Jinzhou, Liaoning 121000, China
  • Online:2016-03-15 Published:2016-03-17

摘要: 针对不完备信息系统的数据缺失填补精度不够高问题,以水产养殖预警信息系统为背景,提出一种基于属性相关度的缺失数据填补算法。在有效保证预警信息系统确定性的前提下,通过研究限制容差关系知识和决策规则,根据新定义的限制相容关系求出缺失对象的限制相容类,同时将条件属性之间的相关度概念引入,构造出一种新的扩展矩阵进行数据填补,实现了系统的完备性。以鲈鱼养殖缺失数据填补为实例,以数据集进行填补验证,结果表明与其他方法相比该算法在填补准确度和时间性能上有明显提高。

关键词: 不完备信息系统, 限制相容关系, 相关度, 扩展矩阵, 数据集

Abstract: In view of less accurate in complement problem of incomplete information system, a missing data imputation algorithm is proposed based on attribute relevancy in aquaculture safety warning information system. According to study of the limited tolerance relation and decision rules, a new limited compatibility class is solved by the redefined limited compatibility relation. The relevancy of conditional attributes is introduced to construct a new extended matrix and impute data on the premise of effective guarantee deterministic to realize the completeness warning information system. Taking the missing data imputation of perch cultured as a case and using the data sets to fill verification, it shows the algorithm is superior to others on imputation accuracy and data reinforcement.

Key words: incomplete information system, limited compatibility relation, relevancy, extended matrix, data set