计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (12): 24-28.

• 博士论坛 • 上一篇    下一篇

数据挖掘中区间数据模糊聚类研究
——基于Wasserstein测度

李  红,孙秋碧   

  1. 福州大学 管理学院 统计系,福州 350108
  • 出版日期:2012-04-21 发布日期:2012-04-20

Fuzzy clustering of interval data based on Wasserstein distances in data mining

LI Hong, SUN Qiubi   

  1. Department of Statistics, Management College, Fuzhou University, Fuzhou 350108, China
  • Online:2012-04-21 Published:2012-04-20

摘要: 针对目前区间数据模糊聚类研究中区间距离定义存在的局限性,引入能够考虑区间数值分布特征的Wasserstein距离测度,提出基于Wasserstein距离测度的单指标和双指标自适应模糊聚类算法及迭代模型。通过仿真实验和CR指数,证实了该类模型的优势。该算法在海量、堆积如山的数据挖掘中有着重要的实践意义。

关键词: 模糊聚类, 区间数据, 符号数据分析, 自适应

Abstract: Because of the limitations of the in-use distance in fuzzy clustering models for interval data, this paper puts forward the Wasserstein distances into interval data, and gets the adaptive single-index and adaptive double-index fuzzy clustering models. From the simulation results and CR index, the advantages of the model are proved. The model has strong meanings in empirical work when data is unstable and missing.

Key words: fuzzy clustering, interval data, symbolic data analysis, adaptive