Computer Engineering and Applications ›› 2011, Vol. 47 ›› Issue (29): 85-89.

• 网络、通信、安全 • Previous Articles     Next Articles

Algorithm of discovering preferred browsing paths based on cloud-computing

CHENG Miao   

  1. College of Management,University of Science and Technology of China,Hefei 230026,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-10-11 Published:2011-10-11

基于云计算的用户浏览偏爱路径挖掘算法

程 苗   

  1. 中国科学技术大学 管理学院,合肥 230026

Abstract: Mining user preferred browsing paths from Web logs is an important research topic.The current mining algorithms are focused on users’ browsing frequency,neglecting an important problem of whether users are interested in the frequent path or not.Based on the analysis of the present algorithms for mining user browsing patterns,Web topology structure is combined to revise the measures of users’ preferred browsing paths which are based on browsing frequency,and a concept of useful preference is presented.The bad impact of mining is removed due to pages’ place and links;meanwhile,due to the problem that current mining system’s computational capacity on single node is not enough,by the advantage of cloud computing’s distributed processing and virtual technology,it presents a method of data processing based on cloud computing to mining users’ preferred browsing paths.The result shows,this algorithm is better than one which is based on frequency when mining a number of Web logs in accuracy and efficiency.

Key words: preferred browsing paths, cloud computing, Web usage mining, Web log

摘要: 从Web日志中挖掘用户浏览偏爱路径是一个重要的研究课题。目前的挖掘算法注重客观访问频度,忽略了用户对这一频繁访问路径是否感兴趣。在分析目前用户偏爱路径挖掘算法存在的问题的基础上,结合网站拓扑结构图修正基于频度的用户偏爱路径的衡量标准,提出了有用偏爱度的概念,从而剔除由于页面放置和链接等因素对挖掘的影响;针对目前基于单一节点的挖掘系统的计算能力不足的问题,利用云计算的分布式处理和虚拟化技术的优势,给出了一种基于云计算的数据处理方法,在此基础上挖掘用户浏览偏爱路径。实验表明,该算法针对大数据量的日志进行挖掘,准确率和效率比普通基于频度进行用户浏览偏爱路径挖掘的算法有所提高。

关键词: 浏览偏爱路径, 云计算, Web使用挖掘, Web日志