计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (15): 89-95.DOI: 10.3778/j.issn.1002-8331.1804-0107

• 大数据与云计算 • 上一篇    下一篇

最大依赖集在不一致数据检测中的应用

戴超凡,李沛,王文倩   

  1. 国防科技大学 信息系统工程重点实验室,长沙 410073
  • 出版日期:2019-08-01 发布日期:2019-07-26

Application of Maximum Dependency Set in Inconsistent Data Detection

DAI Chaofan, LI Pei, WANG Wenqian   

  1. Science and Technology on Information Systems Engineering Laboratory, National University of Science Technology, Changsha 410073, China
  • Online:2019-08-01 Published:2019-07-26

摘要: 针对条件函数依赖(CFDs)对不一致数据检测不完备问题,提出基于最大依赖集(MDS)的依赖提升算法(DLA),通过获取依赖中包含的隐性依赖(RCFDs)对数据集中的不一致数据进行检测。利用动态值域调整,设置数值变化的前移和后移指针,改进原算法的枚举过程,提高了算法对连续属性的适用性,给出动态值域调整和依赖提升算法的算法流程和伪代码,并对算法的收敛性和时间复杂度进行分析。最后通过对照实验,对比了依赖提升算法和基于CFDs的检测方法的检测精度和时间代价,验证了算法的有效性。

关键词: 条件函数依赖(CFDs), 不一致数据, 最大依赖集(MDS), 动态值域调整

Abstract: For the incomplete detection of inconsistent data by CFDs, this paper proposes a Dependency Lifting Algorithm(DLA) based on Maximum Dependency Set(MDS), which detects inconsistent data in data set by acquiring Recessive Conditional Functional Dependencies(RCFDs) in CFDs. Presenting the dynamic domain adjustment, setting forward and backward pointers of numerical change to improve the enumeration process in original algorithm, the applicability of the algorithm to the continuous attributes is raised too. Then, this paper provides the algorithm flow and pseudo code of dynamic domain adjustment and the DLA, analyses the convergence and time complexity of them. Finally, the validity of the DLA is verified by comparing the detection accuracy and time-cost.

Key words: Conditional Functional Dependency(CFDs), inconsistent data, Maximum Dependency Set(MDS), dynamic domain adjustment