Computer Engineering and Applications ›› 2014, Vol. 50 ›› Issue (2): 226-230.

Previous Articles     Next Articles

Parallel implementation for Last based on Hadoop Streaming

DONG Benzhi, LI Wenhao, JING Weipeng   

  1. College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
  • Online:2014-01-15 Published:2014-01-26

基于Hadoop Streaming的Last比对软件并行化的研究与实现

董本志,李文浩,景维鹏   

  1. 东北林业大学 信息与计算机工程学院,哈尔滨 150040

Abstract: With the arrival of the next generation sequencing technology, stand-alone version of the Last alignment software is not good enough for massive data processing while Hadoop Streaming technology can solve the problem by rapidly deploying Last software to the distributed cluster. It will be ensured to a balanced and effective data segmentation as well as control of the parallel granularity by the NFS-based customizing segmentation of data set and Partitioner based distribution of tasks. Experimental results show that this method can guarantee the same results with the stand-alone operation, effectively reduce the running time with a relatively high speed.

Key words: hadoop streaming, software parallelization, last alignment software

摘要: 随着下一代测序技术的到来,单机版Last比对软件已经不能满足海量数据的处理需求。使用Hadoop Streaming技术将Last比对软件快速部署到云计算环境中,解决当前单机版Last比对软件处理大数据能力差的问题。通过自定义的基于NFS 文件系统的数据集切分方法和基于Partitioner的任务分配方式能够实现均衡高效的数据切分,并保证并行化粒度可控。实验结果表明,在保证与单机运行结果一致的情况下,这种方法能有效缩减软件运行时间,具有较高的加速比。

关键词: hadoop streaming, 软件并行化, last比对软件