Computer Engineering and Applications ›› 2018, Vol. 54 ›› Issue (13): 19-26.DOI: 10.3778/j.issn.1002-8331.1802-0186

Previous Articles     Next Articles

A survey of high dimensional data visual analysis methods based on subspace clustering

TIAN Shuai, CHEN Yi   

  1. Beijing Key Laboratory of Big Data Technology for Food Safety, School of Computer and Information Engineering, Beijing Technology and Business University, Beijing 100048, China
  • Online:2018-07-01 Published:2018-07-17

基于子空间聚类的高维数据可视分析方法综述

田  帅,陈  谊   

  1. 北京工商大学 计算机与信息工程学院 食品安全大数据技术北京市重点实验室,北京 100048

Abstract: With the rapid development of information technology and the advent of big data era, the data show the complex features of high dimensionality and nonlinearity. For high-dimensional data, it is often difficult to find feature regions that reflect distribution patterns in full-dimensional space, but most of the traditional clustering algorithms only have good scalability for low-dimensional data. Therefore, when the traditional clustering algorithm processes high-dimensional data, the clustering results may not meet the needs of the current stage. The subspace clustering algorithm searches for clusters existing in the high-dimensional data subspace, and divides the original feature space of data into different subsets of features to reduce the influence of uncorrelated features and preserve the main features in the original data. The subspace clustering method can find the information that is not easy to show in high-dimensional data and display the internal structure of data attributes and dimensions through visualization techniques, which provides an effective method for visual analysis of high-dimensional data. This paper summarizes the research progress of high-dimensional data visual analysis methods based on subspace clustering in recent years, and elaborates three different methods based on feature selection, subspace exploration and subspace clustering. Then, the methods and applications of its interaction analysis are analyzed, and the future development trends of visual analysis methods of high-dimensional data are prospected.

Key words: high dimensional data, visual analysis, subspace exploration, subspace clustering

摘要: 随着信息技术的飞速发展和大数据时代的来临,数据呈现出高维性、非线性等复杂特征。对于高维数据来说,在全维空间上往往很难找到反映分布模式的特征区域,而大多数传统聚类算法仅对低维数据具有良好的扩展性。因此,传统聚类算法在处理高维数据的时候,产生的聚类结果可能无法满足现阶段的需求。而子空间聚类算法搜索存在于高维数据子空间中的簇,将数据的原始特征空间分为不同的特征子集,减少不相关特征的影响,保留原数据中的主要特征。通过子空间聚类方法可以发现高维数据中不易展现的信息,并通过可视化技术展现数据属性和维度的内在结构,为高维数据可视分析提供了有效手段。总结了近年来基于子空间聚类的高维数据可视分析方法研究进展,从基于特征选择、基于子空间探索、基于子空间聚类的3种不同方法进行阐述,并对其交互分析方法和应用进行分析,同时对高维数据可视分析方法的未来发展趋势进行了展望。

关键词: 高维数据, 可视分析, 子空间探索, 子空间聚类