计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (1): 99-109.DOI: 10.3778/j.issn.1002-8331.2002-0194

• 大数据与云计算 • 上一篇    下一篇

ASExplorer:基于联合熵的多维相关性可视分析系统

张迪,杨沛,邓鑫波,赵千川   

  1. 1.兰州理工大学 计算机与通信学院,兰州 730050
    2.北京数字观星科技有限公司 流影团队,北京 100080
    3.清华大学 自动化系,北京 100084
  • 出版日期:2021-01-01 发布日期:2020-12-31

ASExplorer: Multi-dimensional Correlation Visual Analysis System Based on Joint Entropy

ZHANG Di, YANG Pei, DENG Xinbo, ZHAO Qianchuan   

  1. 1.School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China
    2.Liuying Team, Beijing Shuziguanxing Science and Technology Co., Ltd., Beijing 100080, China
    3.Deptartment of Automation, Tsinghua University, Beijing 100084, China
  • Online:2021-01-01 Published:2020-12-31

摘要:

数据维度相关性分析一直是数据分析领域的研究重点。传统的可视化方法可通过图形描述直观判断几个数据维度存在何种相关关系,但是难以解决维数灾难问题。一些数据挖掘方法虽然可行,但是难以把过程具象化,并且在一些应用场景下仍然需要可视化方法提供参数指导。提出了ASExplorer:一个探索高维数据维度相关性为目的的可视分析系统。该系统首先基于联合熵的维度重要性评价算法,帮助用户选择分析路径和过滤数据,然后基于以采样尺度为中心的交互探索方法,令用户可以同时探索多个数据维度在采样尺度变化时的关联关系。该系统适用于缺乏先验知识的数据集的早期分析过程,案例分析和用户研究验证了该系统的有效性。

关键词: 高维数据, 联合熵, 数据可视化, 可视分析

Abstract:

Multi-dimensional correlation analysis has always been the research emphasis in the field of data analysis. Traditional visualization can intuitively judge the type of correlation in several data dimensions by graphical method, but hard to solve the curse of dimensionality. Although some data mining methods are feasible, it’s hard to visualize the process, and still need parameter guidance supplied by visualization in many scenes. ASExplorer is a visual analytics system developed for exploring the relevance of data dimension. In the first place, it can help users choose analysis path and filter data by an algorithm of dimension importance evaluation based on joint entropy, then explorer the correlativity among multiple dimensions when sampling scale changes. This system is suitable for the early analysis of data set lacking prior knowledge. Case study and user research verify the effectiveness of the system.

Key words: high-dimensional data, joint entropy, data visualization, visual analysis