Computer Engineering and Applications ›› 2016, Vol. 52 ›› Issue (16): 186-191.

Previous Articles     Next Articles

Dense optical flow parallel computing method based on mesh many-core architecture

YU Jin1, ZHOU Haojie2, HAI Zhilei1   

  1. 1.School of Internet of Things, Jiangnan University, Wuxi, Jiangsu 214122,China
    2.State Key Laboratory of Mathematical Engineering and Advanced Computing, Wuxi, Jiangsu 214125, China
  • Online:2016-08-15 Published:2016-08-12

一种基于众核架构的稠密光流并行计算方法

喻  津1,周浩杰2,柴志雷1   

  1. 1.江南大学 物联网工程学院,江苏 无锡 214122
    2.数学工程与先进计算国家重点实验室,江苏 无锡 214125

Abstract: Techniques of optical flow computation are widely used in many video/image based applications such as motion detection, motion estimation and video analysis etc. However, high-quality optical flow algorithms are computationally intensive. Slow computation limits the applicability of optical flow computation in real-world applications. In this paper, based on the combine-brightness-gradient model of high qualityoptical flow computation, a kind of efficient and scalable parallel computing method is proposed. Through being verified by the typical mesh many-core——Tilera, for the 640×480 image, execution time of the parallel computing method on Tilera 36-core is 0.80 seconds. It is 2.56 times faster than that on a 3.40 GHz CPU i3-3240. Furthermore, its power consumption is 1/6 of the CPU. When embedded systems are considered, the execution time is 33 times of the ARM9 processor with half power consumption. Experiments show that the parallel algorithm has good scalability, whichcan be mapped to different number of cores to meet the needs of system on performance and power consumption.

Key words: dense optical flow, scalable, parallel algorithm, mesh, many-core architecture, Tilera

摘要: 光流法是计算机视觉中的一个基础性算法,可广泛应用于运动检测、运动估计、视频分析等领域。但高质量光流法最大的问题是计算复杂、速度慢,限制了它在实际系统中的应用。针对一种混合亮度和梯度模型的高质量光流法,为其设计了一种高效、可扩展的并行计算方法。通过在具有代表性的网络众核架构—Tilera上进行验证,对于分辨率为640×480的图片,提出的并行计算方法在具有36核的Tilera处理器上执行时间为0.80秒,比主频3.40 GHz的CPU i3-3240快2.56倍,但功耗不到其1/6。当用于嵌入式环境时,其速度比ARM9处理器快33倍,而功耗只有它的一半。实验表明该并行算法具有良好的扩展性,可通过选择不同核数的处理器满足系统对性能、功耗的综合需求。

关键词: 稠密光流法, 可扩展性, 并行算法, 网格, 多核架构, Tilera