Semi-supervised clustering approach with discriminant analysis

doi:10.3778/j.issn.1002-8331.2010.06.040

Computer Engineering and Applications ›› 2010, Vol. 46 ›› Issue (6): 139-143.DOI: 10.3778/j.issn.1002-8331.2010.06.040

• 数据库、信号与信息处理 • Previous Articles Next Articles

Semi-supervised clustering approach with discriminant analysis

CHEN Xiao-dong^1，2，YIN Xue-song²，LIN Huan-xiang³

1.College of Computer Science，Zhejiang University，Hangzhou 310027，China
2.College of Information and Engineering，Zhejiang Radio & TV University，Hangzhou 310012，China
3.College of Information & Electronic Engineering，Zhejiang University of Science & Technology，Hangzhou 310012，China

Received:2008-09-08 Revised:2008-12-15 Online:2010-02-21 Published:2010-02-21
Contact: CHEN Xiao-dong

基于判别分析的半监督聚类方法

陈小冬^1，2，尹学松²，林焕祥³

1.浙江大学计算机科学与技术学院，杭州 310027
2.浙江广播电视大学信息与工程学院，杭州 310012
3.浙江科技学院信息学院，杭州 310012

通讯作者: 陈小冬

Abstract

Abstract: The semi-supervised clustering is to mine and help to understand better the structure of unlabeled data and to more closely conform to the user’s preferences using those supervised data，in comparison with unsupervised clustering.Most existing semi-supervised clustering methods are designed for handling low-dimensional data.In this paper，a novel Semi-supervised Clustering Approach with Discriminant Analysis（SCADA） is presented for clustering the high-dimensional data.Specifically，the data are first mapped onto the low-dimensional space by principal component analysis such that constrained spherical K-means algorithm is used to cluster those transformed data.Secondly，linear discriminant analysis is used to reduce the number of the dimensionality of the data in terms of the clustering results.Finally，the data in the embedded space are clustered.Indeed，the experimental results on several real-world data sets show the SCADA method can effectively deal with the high-dimensional data and provides an appealing clustering performance.

Key words: semi-supervised clustering, pairwise constraint, principal component analysis, linear discriminant analysis

摘要： 与无监督聚类相比，半监督聚类是利用一部分先验信息来更好地挖掘和理解数据的内在结构，并紧密遵从用户的偏好。现有的典型半监督聚类算法仅仅适合于低维数据，文中提出一种新颖的基于判别分析的半监督聚类算法来解决高维数据聚类问题。新算法首先使用主成分分析来投影高维数据，进一步在投影空间中，使用基于球形K均值聚类算法对数据聚类；然后利用聚类结果，使用线性判别分析降维输入空间数据；最后在投影空间中对数据再次聚类。在一组真实数据集上的实验表明，所提出的算法不仅可以有效地处理高维数据，还提高了聚类性能。

关键词: 半监督聚类, 成对约束, 主成分分析, 线性判别分析

CLC Number:

TP311

CHEN Xiao-dong^1，2，YIN Xue-song²，LIN Huan-xiang³. Semi-supervised clustering approach with discriminant analysis[J]. Computer Engineering and Applications, 2010, 46(6): 139-143.

陈小冬^1，2，尹学松²，林焕祥³. 基于判别分析的半监督聚类方法[J]. 计算机工程与应用, 2010, 46(6): 139-143.

[1]	QIAO Hui, ZHOU Shuisheng. Nolinear Angle 2DPCA and Its Application on Face Recognition [J]. Computer Engineering and Applications, 2021, 57(8): 112-118.
[2]	YU Duo, HUANG Yongdong. Hyperspectral Image Classification Based on SPCA and Domain Transform Recursive Filtering [J]. Computer Engineering and Applications, 2021, 57(4): 199-208.
[3]	SHEN Shaoyu, CAI Manchun, LU Tianliang, ZHAO Qi. Intrusion Detection Algorithm based on LFKPCA-DWELM [J]. Computer Engineering and Applications, 2021, 57(17): 130-137.
[4]	ZHANG Dongmei, Mairidan Wushouer, Gulanbaier Tuerhong. One-Class Classification Method for High-Dimensional Mixed and Unbalanced Credit Score Data [J]. Computer Engineering and Applications, 2021, 57(10): 233-240.
[5]	HAN Song, HAN Qiuhong. Review of Semi-Supervised Learning Research [J]. Computer Engineering and Applications, 2020, 56(6): 19-27.
[6]	LIN Kezheng, ZHANG Yuanming, LI Haotian. Research on HOG Feature Extraction Algorithm Weighted by Information Entropy [J]. Computer Engineering and Applications, 2020, 56(6): 147-152.
[7]	QIU Ningjia, WANG Xiaoxia, WANG Peng, ZHOU Sicheng, WANG Yanchun. Research on Convolutional Neural Network Algorithm Combined with Transfer Learning Model [J]. Computer Engineering and Applications, 2020, 56(5): 43-48.
[8]	HUANG Guangjun, DENG Yuanlong. Polarizer Visual Defect Detection and Classification Based on Improved LBP and SVM Algorithm [J]. Computer Engineering and Applications, 2020, 56(22): 251-255.
[9]	QIU Ningjia, SHEN Zhuorui, WANG Hui, WANG Peng. Semi-supervised Learning Optimization Algorithm for Communication Spam Text Recognition [J]. Computer Engineering and Applications, 2020, 56(17): 121-128.
[10]	YANG Yongpeng, YANG Zhenzhen, LI Jianlin, LE Jun. Low Rank and Sparse Decomposition and Its Application in Video and Image Processing [J]. Computer Engineering and Applications, 2020, 56(16): 21-30.
[11]	GENG Huantong, ZHOU Lifa, DING Yangyang, ZHOU Shansheng. Improved MOEA/D Algorithm Based on New Differential Evolution Model [J]. Computer Engineering and Applications, 2019, 55(8): 138-146.
[12]	YANG Shuo, LIU Bing, ZHOU Yong. Semi-Supervised Low-Rank Kernel Learning Algorithm Based on Sparse Coding [J]. Computer Engineering and Applications, 2019, 55(7): 175-181.
[13]	CHEN Jia, LIU Dongxue, WU Dashuo. Stock Index Forecasting Method Based on Feature Selection and LSTM Model [J]. Computer Engineering and Applications, 2019, 55(6): 108-112.
[14]	DONG Xiwei1，2, WANG Yuwei3, ZHOU Jun1. Robust Multi-View Collaboration Intact Discriminant Subspace Learning Algorithm [J]. Computer Engineering and Applications, 2019, 55(3): 108-114.
[15]	GONG Yanlu, LV Jia. Co-Training Method Combined with Semi-Supervised Clustering and Weighted [K]-Nearest Neighbor [J]. Computer Engineering and Applications, 2019, 55(22): 114-118.

Semi-supervised clustering approach with discriminant analysis

基于判别分析的半监督聚类方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics