Study of multi-keywords sorting method based on Hadoop

Abstract

Abstract: It takes a long time to sort big data by multi-keywords with single machine. In order to improve the efficiency of sorting, two methods of multi-keywords sort are given according to MapReduce model of Hadoop. In method one, chain radix sort algorithm is used by Reduce function to sort big data by multi-keywords in parallel, which can improve the efficiency of sorting with multiple nodes. In method two, composite key and comparator are defined, which implements multi-keywords comparison between records by byte so that it can save more time on deserializing objects. The performance of the two methods is tested by experiments. The experimental results show that the two methods can achieve high sorting efficiency and good scalability.

Key words: Hadoop, MapReduce model, multi-keywords sort, radix sort

摘要： 在单机环境下按多关键字对大数据排序需要较长的执行时间，为了提高按多关键字对大数据排序的效率，根据Hadoop的MapReduce模型，给出了两种基于Hadoop的多关键字排序方法。方法一在Reduce函数中使用链式基数排序算法按多关键字对大数据并行排序，利用多个节点的计算能力提高排序的效率。方法二通过定义组合键和比较器实现了对记录的多个关键字按字节比较，节省了将字节流反序列化为对象的时间。通过实验测试了两种方法的性能，实验结果表明，两种方法均能取得较高的排序效率和较好的可扩展性。

关键词: Hadoop, MapReduce模型, 多关键字排序, 基数排序

ZHOU Guojun. Study of multi-keywords sorting method based on Hadoop[J]. Computer Engineering and Applications, 2016, 52(17): 79-83.

周国军. 基于Hadoop的多关键字排序方法研究[J]. 计算机工程与应用, 2016, 52(17): 79-83.

[1]	WU Dongyang, DOU Jianping, LI Jun. Design of Digital Twin System for Quadrotor [J]. Computer Engineering and Applications, 2021, 57(16): 237-244.
[2]	LI Leixiao, DENG Dan, LI Jie, WANG Yongsheng. All-to-All Comparison Computing Data Distribution Strategy Based on Particle Swarm Optimization [J]. Computer Engineering and Applications, 2021, 57(15): 109-117.
[3]	LIU Jun, LI Wei, WU Mengting, CHEN Qifeng. New Design of Image Parallel Processing Model Based on Hadoop Platform [J]. Computer Engineering and Applications, 2019, 55(6): 186-190.
[4]	WANG Jingyu, LUAN Junqing, TAN Yuesheng. Research on Big Data Access Control Model Based on Data Sensitivity [J]. Computer Engineering and Applications, 2019, 55(23): 70-77.
[5]	YIN Qiao1，2, WEI Zhanchen1，2, HUANG Qiulan1, SUN Gongxing1, SHI Jingyan1. Development and Application of Hadoop Massive Data Migration System [J]. Computer Engineering and Applications, 2019, 55(13): 66-71.
[6]	CAO Jingjing1, REN Xinxin2, XU Xianhao2. Research on Logistics Path Frequent Patterns Based on Parallel Apriori [J]. Computer Engineering and Applications, 2019, 55(11): 257-264.
[7]	WU Yaoyao1, YANG Geng1，2. Distributed File System Load Balancing in Cloud Environment [J]. Computer Engineering and Applications, 2019, 55(10): 67-72.
[8]	MA Zhen, HALIDAN Abudureyimu, LI Xitong. Research on access optimization of small files in massive sample data sets [J]. Computer Engineering and Applications, 2018, 54(22): 80-84.
[9]	WANG Yongchao, LU Mingming. Research and implementation of big data migration for financial industry [J]. Computer Engineering and Applications, 2018, 54(13): 93-99.
[10]	ZHANG Renqi, LI Jianhua, FAN Lei. Research on parallel strategy of convolution neural network in distributed environment [J]. Computer Engineering and Applications, 2017, 53(8): 1-7.
[11]	XIA Xiaoyun, ZHANG Renbin, XIE Rui, WANG Cong. MapReduce approach for defect inspection of TFT-LCD [J]. Computer Engineering and Applications, 2017, 53(5): 202-206.
[12]	MIAO Xiaolong1, CHEN Hao1, ZHONG Jiang2. Energy-conserving strategies of file storage based on cluster scale adjustment [J]. Computer Engineering and Applications, 2017, 53(24): 80-85.
[13]	LIU Shuoyang, ZHOU Lijuan, REN Zhongshan, ZHANG Shudong. HDFS load balancing in ophthalmic medical image file access [J]. Computer Engineering and Applications, 2017, 53(2): 253-259.
[14]	FENG Xingjie, HE Yang. Improvement of job scheduling algorithm on Hadoop [J]. Computer Engineering and Applications, 2017, 53(12): 85-91.
[15]	FENG Xingjie, WU Xiyu, ZHAO Jie, HE Yang, FANG Shu. Data warehouse of QAR based on Hive [J]. Computer Engineering and Applications, 2017, 53(11): 90-94.

Study of multi-keywords sorting method based on Hadoop

基于Hadoop的多关键字排序方法研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics