Dimension Reduction and Visualization of Mixed-Type Data Based on E-t-SNE

doi:10.3778/j.issn.1002-8331.1903-0330

Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (6): 66-72.DOI: 10.3778/j.issn.1002-8331.1903-0330

Previous Articles Next Articles

Dimension Reduction and Visualization of Mixed-Type Data Based on E-t-SNE

WEI Shichao, LI Xin, ZHANG Yichi, ZHOU Xiaofeng, LI Shuai

1.Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China
2.University of Chinese Academy of Sciences, Beijing 100049, China
3.Key Laboratory of Network Control System, Chinese Academy of Sciences, Shenyang 110016, China
4.Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110016, China

Online:2020-03-15 Published:2020-03-13

基于E-t-SNE的混合属性数据降维可视化方法

魏世超，李歆，张宜弛，周晓锋，李帅

1.中国科学院沈阳自动化研究所，沈阳 110016
2.中国科学院大学，北京 100049
3.中国科学院网络化控制系统重点实验室，沈阳 110016
4.中国科学院机器人与智能制造创新研究院，沈阳 110016

Abstract

Abstract:

Aiming at the problem that the traditional t-SNE algorithm can only deal with single attribute data and can’t handle mixed type data very well. An extended t-SNE dimensionality reduction visualization algorithm named E-t-SNE is proposed. The extension facilitates to handle mixed type data. The concept of information entropy is introduced to construct the distance matrix of categorical data. The distance matrix of mixed type data is constructed by combining the distance between categorical data and the Euclidean distance of numerical data. The combined matrix is used into t-SNE algorithm to reduce the dimension and display it in two-dimensional space. In addition, in order to verify the effectiveness of the algorithm, [k]-Nearest Neighbor[(kNN)] algorithm is used to evaluate. Experiments on UCI datasets show that this method not only has good visualization ability in dealing with mixed attribute data, but also can effectively reduce the dimension of different classes of data and improve the classification accuracy of subsequent classifiers.

Key words: t-SNE algorithm, mixed type data, dimension reduction, visualization

摘要：

针对传统的t分布随机近邻嵌入（t-SNE）算法只能处理单一属型数据，不能很好地处理混合属性数据的问题，提出一种扩展的t-SNE降维可视化算法E-t-SNE，用于处理混合属性数据。该方法引入信息熵概念来构建分类属性数据的距离矩阵，采用分类属性数据距离与数值属性数据欧式距离相结合的方式构建混合属性数据距离矩阵，将新的距离矩阵输入t-SNE算法对数据进行降维并在二维空间可视化展示。此外，为验证算法有效性，采用[k]近邻[(kNN)]算法对混合数据降维后的效果进行评价。通过在UCI数据集上的实验表明，该方法在处理混合属性数据方面，不仅具有较好的可视化能力，而且能有效地对不同类别的数据进行降维分簇，提升后续分类器的分类准确率。

关键词: t-SNE算法, 混合属性数据, 降维, 可视化

WEI Shichao, LI Xin, ZHANG Yichi, ZHOU Xiaofeng, LI Shuai. Dimension Reduction and Visualization of Mixed-Type Data Based on E-t-SNE[J]. Computer Engineering and Applications, 2020, 56(6): 66-72.

魏世超，李歆，张宜弛，周晓锋，李帅. 基于E-t-SNE的混合属性数据降维可视化方法[J]. 计算机工程与应用, 2020, 56(6): 66-72.

[1]	YU Lei, XU Guangluan, WANG Yang, LIN Daoyu, LI Feng. Research on Multidimensional Visualization of Heterogeneous Network Based on Dynamic Projection Embedding [J]. Computer Engineering and Applications, 2021, 57(8): 145-152.
[2]	WANG Youfa, ZHOU Yuanyuan, LUO Jianqiang. Analysis of Hotspots and Progress in Intelligent Manufacturing in Recent 20 Years [J]. Computer Engineering and Applications, 2021, 57(6): 49-57.
[3]	LI Xiaoying, TANG Donglin. Creative Thinking Knowledge Service with User-Generated Content [J]. Computer Engineering and Applications, 2021, 57(4): 236-244.
[4]	JIANG Yangyang, JIN Bo, ZHANG Baochang. Research Progress of Natural Language Processing Based on Deep Learning [J]. Computer Engineering and Applications, 2021, 57(22): 1-14.
[5]	CHEN Xiaohan, WEI Shuning, QIN Zhengze. Malware Family Classification Based on Deep Learning Visualization [J]. Computer Engineering and Applications, 2021, 57(22): 131-138.
[6]	REN Zhuojun, CHEN Guang, LU Wenke. Research on Visualization Method of Malware Opcodes [J]. Computer Engineering and Applications, 2021, 57(18): 130-134.
[7]	YANG Geying, SHEN Xiajiong, SHI Xianjin, ZHANG Lei. Visualization of Association Rules in Context of Concept Lattices [J]. Computer Engineering and Applications, 2021, 57(1): 84-91.
[8]	ZHANG Di, YANG Pei, DENG Xinbo, ZHAO Qianchuan. ASExplorer： Multi-dimensional Correlation Visual Analysis System Based on Joint Entropy [J]. Computer Engineering and Applications, 2021, 57(1): 99-109.
[9]	YUAN Fang, YANG Youlong. Improved Distance Formula of [K]-modes Clustering Algorithm for Mixed Categorical Attribute Data [J]. Computer Engineering and Applications, 2020, 56(6): 186-193.
[10]	HAN Song, HAN Qiuhong. Review of Semi-Supervised Learning Research [J]. Computer Engineering and Applications, 2020, 56(6): 19-27.
[11]	WANG Jiarun, SUN Yunan, YIN Hui, YANG Zhilong. Terrain Fitting and Double Convex Preserving for Visual Modeling of Military Target Relations [J]. Computer Engineering and Applications, 2020, 56(5): 270-278.
[12]	WANG Zhangang. Visualization Model of Spatio-Temporal Process Oriented to Perceptual Features of Visual Variables [J]. Computer Engineering and Applications, 2020, 56(4): 50-56.
[13]	ZHAI Yongjie, YANG Xu, WANG Jinna, WANG Kunfeng, ZHAO Zhenbing. Visual Analysis of Deep Convolutional Neural Networks in Parallel Vision Framework [J]. Computer Engineering and Applications, 2020, 56(19): 139-145.
[14]	ZHANG Gongsen, GUO Yi, LI Yongzhe, PEI Xi, XU Xie, ZHOU Jieping. Medical Accelerator Collision Detection Method Combined with Augmented Reality [J]. Computer Engineering and Applications, 2020, 56(17): 197-202.
[15]	HUANG Xin, MO Haimiao, ZHAO Zhigang, ZENG Min. Research on Discrete Enhanced Fireworks Algorithm and [kNN] in Feature Selection [J]. Computer Engineering and Applications, 2020, 56(16): 112-117.

Dimension Reduction and Visualization of Mixed-Type Data Based on E-t-SNE

基于E-t-SNE的混合属性数据降维可视化方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics