计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (9): 283-291.DOI: 10.3778/j.issn.1002-8331.2212-0373

• 图形图像处理 • 上一篇    下一篇

面向桌面交互场景的双目深度测量方法

叶彬,朱兴帅,姚康,丁上上,付威威   

  1. 1.中国科学技术大学 生物医学工程学院(苏州) 生命科学与医学部,江苏 苏州 215000
    2.中国科学院 苏州生物医学工程技术研究所,江苏 苏州 215000
  • 出版日期:2024-05-01 发布日期:2024-04-29

Binocular Depth Measurement Method for Desktop Interaction Scene

YE Bin, ZHU Xingshuai, YAO Kang, DING Shangshang, FU Weiwei   

  1. 1. Division of Life Sciences and Medicine, School of Biomedical Engineering (Suzhou), University of Science and Technology of China, Suzhou, Jiangsu 215000, China
    2. Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, Jiangsu 215000, China
  • Online:2024-05-01 Published:2024-04-29

摘要: 基于视觉的虚拟现实交互方式在桌面书写应用场景中尚未有针对性的解决方案,书写交互中精细动作准确识别的实现,需要一种全新的高精度手笔联合三维识别技术,其中深度准确度是影响三维识别精度的重要因素。为此提出一种高精度双目深度测量方法,该方法针对书写交互采用了高分辨率、近距离的图像对作为输入,并在算法上提出全局与局部重要信息交叉融合的思想以提升速度与精度,减少计算成本。算法使用区域检测模块提取图像对中的手部和笔尖关键区域以重要程度分尺度输入;并且引入区域特征金字塔结构结合多尺度语义信息;同时利用视差级联模块缩小匹配范围,提高网络实时性。实验证明,提出的深度测量方法在手部和笔尖交互区域精度高,实时性好,能够有效辅助提高手笔联合三维识别精度以提供更好的虚拟书写交互体验,具有广泛的应用前景。

关键词: 双目视觉, 深度学习, 立体匹配, 深度测量, 桌面交互

Abstract: Virtual reality interaction methods based on vision have no specific solution in  desktop writing application scene. In order to provide accurate recognition of fine interactive action, a high precision three-dimensional recognition technology based on the combination of hand and pen is needed. Additionally, the depth accuracy is an important factor to the accuracy of three-dimensional recognition. Therefore, a high-precision depth measurement method in this study is provided to use in this paper. The core concept of this method is using high-resolution and close-range image pairs as input for writing interaction, and proposing the idea of cross-fusion of global and local important information to improve speed and accuracy, and reduce computing cost. In the algorithm, the region detection module is used to extract the key areas of the hand and pen tip in the image pair, and then the input is scaled according to the degree of importance. The regional feature pyramid structure is introduced to extract multi-scale semantic information. Meanwhile, disparity cascade module is used to narrow the matching range to improve the real-time performance. Finally, the experiments results confirm that this depth measurement method has high accuracy and good real-time performance in the interactive area between hand and pen tip, and can effectively assist to improve the three-dimensional recognition accuracy in further to provide better writing interactive experience. In summary, this study may provide new understandings and theoretic basis for future prospect of the depth measurement application in writing interaction.

Key words: binocular vision, deep learning, stereo matching, depth measurement, desktop interaction