基于MapReduce的Web日志挖掘

计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (22): 95-98.

基于MapReduce的Web日志挖掘

李彬，刘莉莉

中国矿业大学计算机科学与技术学院，江苏徐州 221116

出版日期:2012-08-01 发布日期:2012-08-06

Weblog mining based on MapReduce

LI Bin, LIU Lili

School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, Jiangsu 221116, China

Online:2012-08-01 Published:2012-08-06

摘要/Abstract

摘要： 针对单一CPU节点的Web数据挖掘系统在挖掘Web海量数据源时存在的计算瓶颈问题，利用云计算的分布式处理和虚拟化技术优势以及蚁群算法并行性的优点，设计一种基于Map/Reduce架构的Web日志挖掘算法。为进一步验证该算法的高效性，通过搭建Hadoop平台，利用该算法挖掘Web日志中用户的偏爱访问路径。实验结果表明，充分利用了集群系统的分布式计算能力处理大量的Web日志文件，可以大大地提高Web数据挖掘的效率。

关键词: 云计算, Map/Reduce, Hadoop平台, Web日志挖掘, 蚁群算法

Abstract: The current data mining system based on single CPU has developed to a bottleneck to deal with mass data from Web. Using the advantage of cloud computing distributed processing, virtualization and parallelism of ant colony algorithm, this paper presents a weblog mining algorithm based on Map/Reduce’s framework. To further verify the high efficiency of the algorithm, it uses the algorithm to mine users’ preferred access path based on Hadoop platform. Experimental results show that, using distributed algorithm to process large number of Weblog files in the cluster, can significantly improve the efficiency of Web data mining.

Key words: cloud computing, Map/Reduce, Hadoop platform, Web log mining, ant colony algorithm

李彬，刘莉莉. 基于MapReduce的Web日志挖掘[J]. 计算机工程与应用, 2012, 48(22): 95-98.

LI Bin, LIU Lili. Weblog mining based on MapReduce[J]. Computer Engineering and Applications, 2012, 48(22): 95-98.

[1]	史春天，曾艳阳，侯守明. 群体智能算法在图像分割中的应用综述[J]. 计算机工程与应用, 2021, 57(8): 36-47.
[2]	张松灿，普杰信，司彦娜，孙力帆. 基于种群相似度的自适应改进蚁群算法及应用[J]. 计算机工程与应用, 2021, 57(8): 70-77.
[3]	朱佳莹，高茂庭. 融合粒子群与改进蚁群算法的AUV路径规划算法[J]. 计算机工程与应用, 2021, 57(6): 267-273.
[4]	卜冠南，刘建华，姜磊，张冬阳. 一种自适应分组的蚁群算法[J]. 计算机工程与应用, 2021, 57(6): 67-73.
[5]	马向华，张谦. 改进蚁群算法在机器人路径规划上的研究[J]. 计算机工程与应用, 2021, 57(5): 210-215.
[6]	王凤琴，柯亨进. 卷积神经网络及其分析在抑郁症判别中的应用[J]. 计算机工程与应用, 2021, 57(5): 245-250.
[7]	翁晓泳. 基于区块链的云计算数据共享系统研究[J]. 计算机工程与应用, 2021, 57(3): 120-124.
[8]	李壮阔，常凯旋. 合作博弈的连续蚁群算法求解[J]. 计算机工程与应用, 2021, 57(24): 198-204.
[9]	王晓光，杨培蓓. 航运物流企业数字化转型设计与效果分析[J]. 计算机工程与应用, 2021, 57(21): 241-247.
[10]	张子然，黄卫华，陈阳，章政，李梓远. 基于双向搜索的改进蚁群路径规划算法[J]. 计算机工程与应用, 2021, 57(21): 270-277.
[11]	田倬璟，黄震春，张益农. 云计算环境任务调度方法研究综述[J]. 计算机工程与应用, 2021, 57(2): 1-11.
[12]	李二超，齐款款. 改进双向蚁群算法的移动机器人路径规划[J]. 计算机工程与应用, 2021, 57(18): 281-288.
[13]	何雅颖，范昕炜. 改进蚁群算法在机器人路径规划中的应用[J]. 计算机工程与应用, 2021, 57(16): 276-282.
[14]	付朝晖，刘长石. 多物流中心共同配送的车辆路径问题研究[J]. 计算机工程与应用, 2021, 57(16): 291-298.
[15]	张苏英，郭宝樑，陈灵芝，刘慧贤. 双向蚁群算法的智能消防疏散图路径规划[J]. 计算机工程与应用, 2021, 57(14): 259-266.