Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (17): 314-324.DOI: 10.3778/j.issn.1002-8331.2012-0173

• Engineering and Applications • Previous Articles    

Network User Analysis Based on Improved Density Peak Clustering Algorithm

LYU Yi, LIU Mandan   

  1. Key Laboratory of Advanced Control and Optimization for Chemical Processes, Ministry of Education, East China University of Science and Technology, Shanghai 200030, China
  • Online:2022-09-01 Published:2022-09-01

基于改进密度峰值聚类算法的轨迹行为分析

吕奕,刘漫丹   

  1. 华东理工大学 化工过程先进控制和优化技术教育部重点实验室,上海 200030

Abstract: In order to dig deeper into the information of campus wireless network trajectory behavior data, a density-based clustering method is used to cluster the trajectory behavior of users on campus. Density-based clustering algorithms usually use distance as a similarity measurement, in order to effectively connect to this type of clustering algorithm, the user similarity matrix is first transformed into a distance matrix through a conversion function. In addition, an outlier detection algorithm is introduced. It combines the outlier detection algorithm and the clustering algorithm. The improved clustering algorithm can not only reduce the input parameters number, but also increase the clustering degree. It can effectively detect anomalies in the data trajectory, to help colleges and universities find people whose browsing information is inconsistent with most of their classmates through the processing of student online records. It can also narrow the target range and conduct targeted processing. Through qualitative analysis and experimental comparison, two kinds of fast search density peak clustering based on outlier detection and sharing nearest neighbor are determined to be suitable for the processing of campus wireless network behavior trajectory similarity matrix, and the internal indicators such as Dunn index are better than original algorithms.

Key words: cluster analysis, peak density, outlier detection, campus wireless network

摘要: 为了深入挖掘校园无线网络轨迹行为数据信息,采用基于密度的聚类方法对校园内用户的轨迹行为进行特征聚类。由于基于密度的聚类算法通常采用距离作为相似性度量方式,为了有效衔接此类聚类算法,先将用户相似度矩阵通过转换函数转变为距离矩阵。引入离群点检测算法,将离群点检测算法与聚类算法相结合,减少参数的输入个数,增加聚类的聚合程度。改进后的聚类算法可以有效检测出数据轨迹的异常,帮助高校通过对学生上网记录的处理找到浏览信息与大部分同学不一致的人,缩小目标范围,进行有针对性的处理。通过定性分析和实验对比验证,确定两种基于离群点检测的共享最近邻的快速搜索密度峰值聚类适用于校园无线网络行为轨迹相似度矩阵的处理,邓恩指数等聚类内部指标及整体性能优于同类算法。

关键词: 聚类分析, 密度峰值, 离群点检测, 校园无线网络