计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (6): 140-144.DOI: 10.3778/j.issn.1002-8331.1711-0422

• 模式识别与人工智能 • 上一篇    下一篇

基于[ε]邻域的三支决策聚类分析

刘  强1,施  虹1,王平心2,3,杨习贝1   

  1. 1.江苏科技大学 计算机学院,江苏 镇江 212003
    2.江苏科技大学 理学院,江苏 镇江 212003
    3.河北师范大学 数学与信息科学学院,石家庄 050024
  • 出版日期:2019-03-15 发布日期:2019-03-14

Three-Way Clustering Analysis Based on [ε] Neighborhood

LIU Qiang1, SHI Hong1, WANG Pingxin2,3, YANG Xibei1   

  1. 1.School of Computer, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu 212003, China
    2.School of Science, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu 212003, China
    3.College of Mathematics and Information Science, Hebei Normal University, Shijiazhuang 050024, China
  • Online:2019-03-15 Published:2019-03-14

摘要: 传统的聚类方法大都是二支决策,即决策一个元素属于一个类或者不属于一个类。然而在处理不确定性信息时,强制将其中的元素划分到一个类中,往往容易带来较高的决策风险。三支决策聚类将确定的元素放入核心域中,将不确定的元素放入边界域中延迟决策,可以有效地降低决策风险。利用数学形态学中膨胀与腐蚀的思想,提出了一种使用样本的[ε]邻域将二支聚类转化为三支聚类的方法。该方法在二支聚类的结果上,利用每个类中元素的[ε]邻域收缩得到核心域,扩张得到边界域。在UCI数据集上的实验结果显示该方法可以降低聚类结果的DBI,提高聚类结果的平均轮廓系数和准确率。

关键词: 三支聚类, 邻域, k-means聚类, k-medoid聚类, fuzzy c-means聚类

Abstract: Traditional clustering methods are two-way clustering which assumes that a cluster must be represented by a set with crisp boundary. However, assigning uncertain elements into a cluster will increase decision risk. Three-way clustering puts the identified elements into the core region and the uncertain elements into the fringe region to reduce decision risk. This paper presents a strategy for converting a two-way cluster to three-way cluster using the [ε] neighborhood of the samples. The method shrinks the two-way clustering result to get the core region and expands the two-way clustering result to get the fringe region. The experiments using the proposed method on UCI data sets show that the strategy is effective in reducing the Davies-Bouldin-Index and increasing the average silhouette coefficient and accuracy of clustering results.

Key words: three-way clustering, neighborhood, k-means clustering, k-medoid clustering, fuzzy c-means clustering