计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (21): 116-122.DOI: 10.3778/j.issn.1002-8331.2010-0269

• 大数据与云计算 • 上一篇    下一篇

交通行业事故文本数据的可视化挖掘分析方法

程宇航,张健钦,李江川,张安   

  1. 北京建筑大学 测绘与城市空间信息学院,北京 100044
  • 出版日期:2021-11-01 发布日期:2021-11-04

Visual Mining and Analysis Method of Text Data in Traffic Accident

CHENG Yuhang, ZHANG Jianqin, LI Jiangchuan, ZHANG An   

  1. School of Geomatics and Urban Spatial Informatics, Beijing University of Civil Engineering and Architecture, Beijing 100044, China
  • Online:2021-11-01 Published:2021-11-04

摘要:

为降低交通行业安全生产风险,深入分析以文本形态隐式存在于事故数据中的时空特征及潜在致因,在用户字典模式对文本数据分词的基础上,使用Word2vec结合Sigmoid激活函数,构建交通安全事故词向量模型,对交通行业安全事故关键词进行分类提取,获得分别包含特征及致因属性的两类关键词,并利用Gephi及Neo4j对特征关键词进行可视化分析以及致因主题总结,对事故时空特征及致因关键因素进行深入挖掘。以北京市为例研究发现:交通安全事故主要集中发生在第三季度,且城六区在事故总量上远高于外环城区,但伤亡比例外环城区较高;通过致因关键词总结,发现人为、设备及环境因素是交通安全事故的主要致因因素;结合以上分析结果,提出合理建议,为北京市交通行业安全生产相关管理部门提供信息支持和科学指导。

关键词: 文本数据, 交通安全事故, 词向量, 关键词分类提取, 可视化分析

Abstract:

In order to deeply analyze the spatiotemporal characteristics and causative factors, which hide in the text data of safety production in transportation industry, select features and causative words in relevant papers as corpus, using Word2vec to construct the vector model of traffic accident words, classifying the key words of Beijing safety production event in traffic industry by using Sigmoid function, two kinds of keywords including spatiotemporal characteristics and causal factors are obtained, using the Gephi and Neo4j to visually analyze the feature keywords, through the summary of causative theme, analyse causal factors keywords. The result shows that traffic accidents mainly occurred in the third quarter, and the total number of accidents in the six urban areas of the center is much higher than that in other urban areas, the proportion of casualties in other urban areas is higher. Human being, equipment and environmental factors are the main causes of traffic accidents. Based on the above analysis, the paper puts forward reasonable suggestions to provide information support and scientific guidance for the relevant management departments of safety production in Beijing transportation industry.

Key words: text data, safety accidents in transportation, word vector, keyword classification, visual analysis