具备迁移能力的类中心距离极大化聚类算法

计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (16): 149-155.

具备迁移能力的类中心距离极大化聚类算法

孙寿伟，钱鹏江，陈爱国，蒋亦樟

江南大学数字媒体学院，江苏无锡 214122

出版日期:2016-08-15 发布日期:2016-08-12

Cluster-center-distance maximization clustering with knowledge transfer

SUN Shouwei, QIAN Pengjiang, CHEN Aiguo, JIANG Yizhang

School of Digital Media, Jiangnan University, Wuxi, Jiangsu 214122, China

Online:2016-08-15 Published:2016-08-12

摘要/Abstract

摘要： 传统的聚类算法在以下两种情况下存在直接失效的风险：一是数据稀少或存在大量干扰数据；二是为了调控数据间的差异性，对数据集进行缩放。为了同时解决上述两个问题，提出了历史知识迁移准则与中心间距极大化准则，并将其运用到极大熵聚类算法中，称之为具备历史迁移能力的中心极大化聚类算法。算法有三大突出的优点：在当前数据稀少或存在污染时，算法有效利用了历史知识进行迁移学习，从而证明了较好的聚类有效性；在数据缩放到一定倍数时，传统聚类算法取得的类中心趋于一致，而算法利用类中心间距极大化准则，有效避免了类中心一致的问题；算法所利用的历史知识均不暴露历史源数据，因此算法具有良好的历史数据隐私保护效果。通过模拟数据集和真实数据集的实验，验证了算法的上述优点。

关键词: 迁移学习, 历史知识, 类中心间距极大, 隐私保护, 模糊聚类

Abstract: Traditional clustering algorithms are prone to being failure in two cases: The data are quite sparse or distorted by plenty of noise or outliers; To proportionally scale raw data in order to control the difference existing in eventual data. To address these issues, this paper first devises the history knowledge transfer as well as the maximum cluster-center-distance mechanisms, and then, combining these two mechanisms with the classical Maximum Entropy Clustering（MEC） approach, this paper proposes the center distance maximization clustering with historical knowledge transfer（HKT-CDMC for short）. In general, the major merits of HKT-CDMC are three-fold: Benefiting from the guidance of historical knowledge, HKT-CDMC proves high effectiveness in the situations where the data are insufficient or distorted by much noise; After data scaling, the cluster centers obtained by those classical clustering methods are likely to be too close, HKT-CDMC, however, can effectively avoid this phenomenon via the maximum cluster-center-distance mechanism; As the historical knowledge cannot be mapped inversely into the raw data, HKT-CDMC is of good capability of privacy protection for the source domain. The experimental studies on both artificial and real-world datasets demonstrated these merits of our work.

Key words: transfer learning, historical knowledge, maximum cluster-center-distance, privacy protection, fuzzy clustering

孙寿伟，钱鹏江，陈爱国，蒋亦樟. 具备迁移能力的类中心距离极大化聚类算法[J]. 计算机工程与应用, 2016, 52(16): 149-155.

SUN Shouwei, QIAN Pengjiang, CHEN Aiguo, JIANG Yizhang. Cluster-center-distance maximization clustering with knowledge transfer[J]. Computer Engineering and Applications, 2016, 52(16): 149-155.

[1]	桑江徽，姜海燕. 基于联合分布的多标记迁移学习[J]. 计算机工程与应用, 2021, 57(9): 154-161.
[2]	许德刚，王露，李凡. 深度学习的典型目标检测算法研究综述[J]. 计算机工程与应用, 2021, 57(8): 10-25.
[3]	徐可文，许波，吴英，徐浩然. 机器学习在超声图像中的应用综述[J]. 计算机工程与应用, 2021, 57(4): 11-17.
[4]	徐志京，汪毅. 青光眼眼底图像的迁移学习分类方法[J]. 计算机工程与应用, 2021, 57(3): 144-149.
[5]	姚可欣，曹卫群. Trans-Net：基于迁移学习的手写简笔画识别[J]. 计算机工程与应用, 2021, 57(3): 182-188.
[6]	高爽，徐巧枝. 迁移学习方法在医学图像领域的应用综述[J]. 计算机工程与应用, 2021, 57(24): 39-50.
[7]	黄英来，艾昕. 改进残差网络在玉米叶片病害图像的分类研究[J]. 计算机工程与应用, 2021, 57(23): 178-184.
[8]	魏立斐，李梦思，张蕾，陈聪聪，陈玉娇，王勤. 基于安全两方计算的隐私保护线性回归算法[J]. 计算机工程与应用, 2021, 57(22): 139-146.
[9]	黄泽英，李海艳，林景亮. 迁移学习下的极限学习机代理建模方法及应用[J]. 计算机工程与应用, 2021, 57(22): 257-262.
[10]	贺智明，徐亿达. 区块链与可搜索加密结合的电子病历共享方案[J]. 计算机工程与应用, 2021, 57(21): 140-147.
[11]	周绍光，吴昊，赵婵娟，陈仁喜. 利用同质区特性的高光谱图像迁移学习分类[J]. 计算机工程与应用, 2021, 57(21): 224-233.
[12]	徐健，黄磊，陈倩倩，陆珍，吴曙培. 基于多尺度特征迁移学习的步态识别研究[J]. 计算机工程与应用, 2021, 57(20): 180-187.
[13]	黎英. 迁移学习在医学图像分析中的应用研究综述[J]. 计算机工程与应用, 2021, 57(20): 42-52.
[14]	吕鑫，赵连成，余记远，谭彬，曾涛，陈娟. 基于轨迹聚类的连续查询隐私保护方法[J]. 计算机工程与应用, 2021, 57(2): 104-112.
[15]	谢裕清，王渊，江樱，杨苗，王永利. 便于数据共享的电网数据湖隐私保护方法[J]. 计算机工程与应用, 2021, 57(2): 113-118.