计算机工程与应用 ›› 2014, Vol. 50 ›› Issue (20): 119-123.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

Map-Reduce在媒资系统数据收集中的应用

彭四伟,许伟静   

  1. 北京化工大学 信息科学与技术学院 网络数据库组,北京 100029
  • 出版日期:2014-10-15 发布日期:2014-10-28

Application of Map-Reduce for data collection in media resource management system

PENG Siwei, XU Weijing   

  1. Network Database Team, College of Information Science & Technology, Beijing University of Chemical Technology, Beijing 100029, China
  • Online:2014-10-15 Published:2014-10-28

摘要: 在传统媒体资源管理系统中,数据收集工作通常是基于C/S或B/S架构设计的,这对服务器的要求比较高。为了提高服务器的性能,传统的模式就是购买更高性能的服务器,然而,近几年出现的云计算也是一个很好解决办法。采用Apache公司的Hadoop Map-Reduce框架来实现数据收集功能,并通过实验,将数据收集工作在传统的单线程模式(传统实现模式)、Hadoop伪分布模式和全分布模式下所需时间进行比较,并对执行结果进行了分析。研究表明:使用Map-Reduce的这种云模式,确实可以缩短执行时间,以达到提高服务器端性能的目的。

关键词: 媒资管理系统, Map-Reduce, 数据收集, hadoop单线程模式, hadoop伪分布模式, hadoop全分布模式

Abstract: In the traditional media resource management system, the task of data collecting is based on the C/S or B/S mode which needs the high performance on servers. In order to acquire the higher performance, the traditional method is to buy servers with higher configuration, the new way is to adopt the cloud computing which is emerged in recent years. In this paper, using the Hadoop Man-Reduce framework of Apache implements data collecting from different sources. In the experiments, It analyzes the experimental results and comparison of three modes which are standalone, pseudo-distributed of Hadoop, fully-distributed of Hadoop. According to the studies, the Map-Reduce can cut down the time and improve the performance of the servers.

Key words: media resource management system, Map-Reduce, data collecting, standalone mode of hadoop, pseudo-distributed mode of hadoop, fully-distributed mode of hadoop