计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (8): 175-181.DOI: 10.3778/j.issn.1002-8331.1801-0191

• 图形图像处理 • 上一篇    下一篇

一种针对类别数据分析的平行坐标改进方法

陈红倩1,2,程中娟1,2,杨倩玉1,2,李  慧3   

  1. 1.北京工商大学 计算机与信息工程学院,北京 100048
    2.北京工商大学 计算机与信息工程学院 食品安全大数据技术北京市重点实验室,北京 100048
    3.北京联合大学 管理学院,北京 100101
  • 出版日期:2019-04-15 发布日期:2019-04-15

Parallel Coordinate Improvement Method for Category Data Analysis

CHEN Hongqian1,2, CHENG Zhongjuan1,2, YANG Qianyu1,2, LI Hui3   

  1. 1.School of Computer and Information Engineering, Beijing Technology and Business University, Beijing 100048, China
    2.Beijing Key Laboratory of Big Data Technology for Food Safety, School of Computer and Information Engineering, Beijing Technology and Business University, Beijing 100048, China
    3.College of Management, Beijing Union University, Beijing 100101, China
  • Online:2019-04-15 Published:2019-04-15

摘要: 针对类别数据在传统平行坐标系中的映射重叠问题,提出类别统计和数据累积式偏移映射的平行坐标改进方法。该方法首先统计多维数据中的各类别数据的频次,使用直方图表示其记录数,将直方图与平行坐标相结合提出改进平行坐标。然后提出一种类别数据的数据累积式偏移算法,将映射在一点的数据均匀分布在坐标轴上的一定区域中,区域的范围根据数据记录数确定。最后设计实现可视化分析系统,通过改进平行坐标实现对数据集的筛选、条件交叉分析、类别间数据分析和维度间数据分析;通过联动视图和弦图两种方式实现每两个维度间的对比分析;通过字云显示每一维度的频次分布。案例数据集实验结果表明,该方法能在平行坐标中实现各维度中类别间的对比、各维度中记录数排序,以及对筛选数据集的分析,展示类别型数据维度间的关联关系。

关键词: 平行坐标, 数据覆盖, 数据筛选, 可视化

Abstract: Aiming at the problem of the overlap between the category data and the traditional parallel coordinate system, a parallel coordinate method of category statistics and data accumulated offset mapping is proposed. The method first counts the frequency of each category data in the multidimensional data, uses the histogram to show the distribution of the detection results and the number of records, and combines the histogram with the parallel coordinates to propose improved parallel coordinates. And then it proposes a data accumulation formula offset algorithm, the data mapped at one point is evenly distributed in a certain area on the coordinate axis, and the range of the area is determined according to the number of data records. Finally, a visual analysis system is designed and implemented, filtering of the data set, cross analysis, analysis of inter-category data and analysis of inter-dimensional data can be accomplished by improving parallel coordinates. It comparatively analyzes every two dimensions by the linkage view and chord diagram, and shows the number of records of each dimension in the data set through the word cloud. The experimental results of case datasets show that the proposed method can simultaneously display the contrasts, sorts, and the analysis of the selected datasets in different dimensions in parallel coordinates, and can visually display the association between the categories of data types.

Key words: parallel coordinates, data coverage, data filtering, visualization