网络用户访问模式挖掘算法研究

计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (5): 61-64.

网络用户访问模式挖掘算法研究

武健

北京信息职业技术学院计算机工程系，北京 100018

出版日期:2016-03-01 发布日期:2016-03-17

Methods for data mining of Internet users accessing and browsing pattern

WU Jian

Department of Computer Engineering, Beijing Information Technology College, Beijing 100018, China

Online:2016-03-01 Published:2016-03-17

摘要/Abstract

摘要： 针对高校校园网受考生及家长关注度越来越高的现象，为深入分析和理解用户的访问模式及其访问热点的变化规律等知识，设计一种隐马尔科夫模型和分层聚类策略相结合的混合聚类算法。基于隐马尔科夫模型将时序数据转换到似然空间，其中似然度的大小通过对称性KL（Kullback-Leibler）距离来标识。构建对称性KL转移矩阵，并借助于分层聚类方法实现对用户访问模式进行聚类。通过将该方法应用于考生及家长对我校官网访问的网络日志数据挖掘进而得到用户访问的三种模式，表明该方法的可行性和有效性。

关键词: 日志数据, 数据挖掘, 隐马尔科夫模型, 聚类

Abstract: Based on the more and more frequent visiting to the official website of the colleges or universities by candidates for colleges and their parents, it is very useful for the improvement of the website to understand the internet users’ accessing purpose and browsing behaviors. This paper combines the hidden Markov model and hierarchical clustering to perform the data mining of dynamic web log data. The original data are transformed by extension of the hidden Markov model and Symmetric Kullback-Leibler （SKL） distance into probabilistic space. Using hierarchical clustering on the SKL confusion matrix, the time series data can be clustered. This method is verified with a dynamic log data of Internet users’ accessing and browsing behaviors lasting for 2 months when the candidates for college and their parents are looking for a proper university to enter. The result shows that there are two patterns of users’ behaviors. This indicates that the method has a very good performance in feasibility and effectiveness.

Key words: log data, data mining, hidden Markov model, clustering

武健. 网络用户访问模式挖掘算法研究[J]. 计算机工程与应用, 2016, 52(5): 61-64.

WU Jian. Methods for data mining of Internet users accessing and browsing pattern[J]. Computer Engineering and Applications, 2016, 52(5): 61-64.

[1]	兰红，黄敏. 融合KNN优化的密度峰值和FCM聚类算法[J]. 计算机工程与应用, 2021, 57(9): 81-88.
[2]	郭晓静，隋昊达. 改进YOLOv3在机场跑道异物目标检测中的应用[J]. 计算机工程与应用, 2021, 57(8): 249-255.
[3]	李莉，纪欣沅，宋嵩. 回环软件缺陷数量预测模型[J]. 计算机工程与应用, 2021, 57(7): 158-163.
[4]	霍光煜，张勇，孙艳丰，尹宝才. 基于语义的档案数据智能分类方法研究[J]. 计算机工程与应用, 2021, 57(6): 247-253.
[5]	杨芳，尹曦，司建辉，刘宏媛，汪雪. 基于侧重点聚类的数学表达式相似度计算方法[J]. 计算机工程与应用, 2021, 57(6): 88-93.
[6]	宗晓萍，陶泽泽. 基于掌握速度的知识追踪模型[J]. 计算机工程与应用, 2021, 57(6): 117-123.
[7]	赵凡，张琳，闻治泉，杨林林，蔺广逢. 一种直接高效的自然场景汉字逼近定位方法[J]. 计算机工程与应用, 2021, 57(6): 159-167.
[8]	彭启慧，宣士斌，高卿. 分布的自动阈值密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(5): 71-78.
[9]	李勇振，廖湖声. 基于图卷积神经网络的多视角聚类[J]. 计算机工程与应用, 2021, 57(5): 115-122.
[10]	王昌龙，张远东，缪宏，杨煜恒. 双通道卷积神经网络在南瓜病害识别上的应用[J]. 计算机工程与应用, 2021, 57(5): 183-189.
[11]	胡晓敏，王明丰，张首荣，李敏. 用于文本聚类的新型差分进化粒子群算法[J]. 计算机工程与应用, 2021, 57(4): 61-67.
[12]	王俊玲，卢新明. 基于语义相关的视频关键帧提取算法[J]. 计算机工程与应用, 2021, 57(4): 192-198.
[13]	高天宇，王庆荣，杨磊. 粗糙集属性依赖度强化的应急数据挖掘模型[J]. 计算机工程与应用, 2021, 57(3): 87-93.
[14]	王芙银，张德生，张晓. 结合鲸鱼优化算法的自适应密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(3): 94-102.
[15]	陈俊丰，郑中团. WKMeans与SMOTE结合的不平衡数据过采样方法[J]. 计算机工程与应用, 2021, 57(23): 106-112.