计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (22): 95-98.
• 研发、设计、测试 • 上一篇 下一篇
李 彬,刘莉莉
出版日期:
发布日期:
LI Bin, LIU Lili
Online:
Published:
摘要: 针对单一CPU节点的Web数据挖掘系统在挖掘Web海量数据源时存在的计算瓶颈问题,利用云计算的分布式处理和虚拟化技术优势以及蚁群算法并行性的优点,设计一种基于Map/Reduce架构的Web日志挖掘算法。为进一步验证该算法的高效性,通过搭建Hadoop平台,利用该算法挖掘Web日志中用户的偏爱访问路径。实验结果表明,充分利用了集群系统的分布式计算能力处理大量的Web日志文件,可以大大地提高Web数据挖掘的效率。
关键词: 云计算, Map/Reduce, Hadoop平台, Web日志挖掘, 蚁群算法
Abstract: The current data mining system based on single CPU has developed to a bottleneck to deal with mass data from Web. Using the advantage of cloud computing distributed processing, virtualization and parallelism of ant colony algorithm, this paper presents a weblog mining algorithm based on Map/Reduce’s framework. To further verify the high efficiency of the algorithm, it uses the algorithm to mine users’ preferred access path based on Hadoop platform. Experimental results show that, using distributed algorithm to process large number of Weblog files in the cluster, can significantly improve the efficiency of Web data mining.
Key words: cloud computing, Map/Reduce, Hadoop platform, Web log mining, ant colony algorithm
李 彬,刘莉莉. 基于MapReduce的Web日志挖掘[J]. 计算机工程与应用, 2012, 48(22): 95-98.
LI Bin, LIU Lili. Weblog mining based on MapReduce[J]. Computer Engineering and Applications, 2012, 48(22): 95-98.
0 / 推荐
导出引用管理器 EndNote|Ris|BibTeX
链接本文: http://cea.ceaj.org/CN/
http://cea.ceaj.org/CN/Y2012/V48/I22/95