计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (18): 1-15.DOI: 10.3778/j.issn.1002-8331.2202-0295
李云龙,卿粼波,韩龙玫,王昱晨
出版日期:
2022-09-15
发布日期:
2022-09-15
LI Yunlong, QING Linbo, HAN Longmei, WANG Yuchen
Online:
2022-09-15
Published:
2022-09-15
摘要: 可供性是指在环境内物体所提供的一系列交互可能,描述环境属性与个体之间的连接过程。其中,视觉可供性研究即通过使用图像、视频等视觉数据,探究视觉主体与环境或物体交互的可能性,涉及到场景识别、动作识别、物体检测等相关领域。视觉可供性可广泛应用于机器人、场景理解等领域。根据目前已有的相关研究,按功能可供性、行为可供性、社交可供性三方面对视觉可供性进行分类,并针对每一类可供性检测方法按照传统机器学习方法和深度学习方法进行详细论述。对当前典型的视觉可供性数据集进行归纳与分析,对视觉可供性的应用方向及未来可能的研究方向进行讨论。
李云龙, 卿粼波, 韩龙玫, 王昱晨. 视觉可供性研究综述[J]. 计算机工程与应用, 2022, 58(18): 1-15.
LI Yunlong, QING Linbo, HAN Longmei, WANG Yuchen. Survey on Visual Affordance Research[J]. Computer Engineering and Applications, 2022, 58(18): 1-15.
[1] GIBSON J J,CARMICHAEL L.The senses considered as perceptual systems[M].Boston:Houghton Mifflin,1966. [2] GIBSON J J.The theory of affordances[J].Perceiving Acting & Knowing,1979. [3] NORMAN D A.The design of everyday things,currency[J].Design of Everyday Things,1998,7(4):245-246. [4] JONES K S.What is an affordance?[M]//How shall affordances be refined? four perspectives.Routledge:[s.n.],2018:107-114. [5] FAJEN B R.Affordance-based control of visually guided action[J].Ecological Psychology,2007,19(4):383-410. [6] SAUER A,SAVINOV N,GEIGER A.Conditional affordance learning for driving in urban environments[C]//Conference on Robot Learning,2018:237-252. [7] WAIBEL A,HANAZAWA T,HINTON G E,et al.Phoneme recognition using time-delay neural networks[J].Readings in Speech Recognition,1990,1(3):393-404. [8] LAFFERTY J,MCCALLUM A,PEREIRA F C N.Conditional random fields:probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the Eighteenth International Conference on Machine Learning,2002. [9] BOHG J,MORALES A,ASFOUR T,et al.Data-driven grasp synthesis—a survey[J].IEEE Transactions on Robotics,2013,30(2):289-309. [10] JAMONE L,UGUR E,CANGELOSI A,et al.Affordances in psychology,neuroscience,and robotics:a survey[J].IEEE Transactions on Cognitive and Developmental Systems,2018,10(1):4-25. [11] MIN H,YI C,LUO R,et al.Affordance research in developmental robotics:a survey[J].IEEE Transactions on Cognitive and Developmental Systems,2017,8(99):237-255. [12] HASSANIN M,KHAN S,TAHTALI M.Visual affordance and function understanding:a survey[J].ACM Computing Surveys(CSUR),2021,54(3):1-35. [13] REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:unified,real-time object detection[J].arXiv:1506.02640,2015. [14] YE C,YANG Y,FERMULLER C,et al.What can I do around here? Deep functional scene understanding for cognitive robots[C]//2017 IEEE International Conference on Robotics and Automation(ICRA),2017. [15] SCHOELER M,W?RG?TTER F.Bootstrapping the semantics of tools:affordance analysis of real world objects on a per-part basis[J].IEEE Transactions on Cognitive and Developmental Systems,2015,8(2):84-98. [16] FITZPATRICK P M,METTA G,NATALE L,et al.Learning about objects through action—initial steps towards artificial cognition[C]//Proceedings of the 2003 IEEE International Conference on Robotics and Automation(ICRA 2003),September 14-19,2003,Taipei,China,2003. [17] PIEROPAN A,EK C H,KJELLSTROM H.Functional object descriptors for human activity modeling[C]//2013 IEEE International Conference on Robotics and Automation,2013. [18] GRABNER H,GALL J,GOOL L V.What makes a chair a chair?[C]//Conference on Computer Vision and Pattern Recognition,2011. [19] CHUANG C Y,LI J,TORRALBA A,et al.Learning to act properly:predicting and explaining affordances from images[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR),2018. [20] COVER T,HART P.Nearest neighbor pattern classification[J].IEEE Transactions on Information Theory,2003,13(1):21-27. [21] AJZERMAN M A,BRAVERMAN E M,ROZONOEHR L I.Theoretical foundations of the potential function method in pattern recognition learning[J].Automation and Remote Control,1964,25(6). [22] STARK L,BOWYER K.Function-based generic recognition for multiple object categories[J].CVGIP Image Understanding,1994,59(1):1-21. [23] PIYATHILAKA L,KODAGODA S.Affordance-map:mapping human context in 3d scenes using cost-sensitive SVM and virtual human models[C]//2015 IEEE International Conference on Robotics and Biomimetics(ROBIO),2015:2035-2040. [24] HERMANS T,REHG J M,BOBICK A.Affordance prediction via learned object attributes[C]//IEEE International Conference on Robotics and Automation(ICRA):Workshop on Semantic Perception,Mapping,and Exploration,2011:181-184. [25] HJELM M,EK C H,DETRY R,et al.Invariant feature mappings for generalizing affordance understanding using regularized metric learning[J].arXiv:1901.10673,2019. [26] ALDOMA A,TOMBARI F,VINCZE M.Supervised learning of hidden and non-hidden 0-order affordances and detection in real scenes[J].Proceedings IEEE International Conference on Robotics & Automation,2012:1732-1739. [27] NGUYEN A,KANOULAS D,CALDWELL D G,et al.Object-based affordances detection with convolutional neural networks and dense conditional random fields[C]//2017 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS),2017:5908-5915. [28] LUDDECKE T,WORGOTTER F.Learning to segment affordances[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops,2017:769-776. [29] YIN C,ZHANG Q,REN W.A new semantic edge aware network for object affordance detection[J].Journal of Intelligent & Robotic Systems,2022,104(1):1-16. [30] ZHU Y,FATHI A,LI F F.Reasoning about object affordances in a knowledge base representation[C]//European Conference on Computer Vision.Cham:Springer,2014:408-424. [31] NAIR L,CHERNOVA S.Feature guided search for creative problem solving through tool construction[J].Frontiers in Robotics and AI,2020:205. [32] FITZGERALD T,GOEL A,THOMAZ A.Modeling and learning constraints for creative tool use[J].Frontiers in Robotics and AI,2021,8. [33] TORRESANI L,LEE K.Large margin component analysis[J].Advances in Neural Information Processing Systems,2006,19. [34] SCHOELER M,PAPON J,WORGOTTER F.Constrained planar cuts-object partitioning for point clouds[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2015:5207-5215. [35] FISCHLER M A,BOLLES R C.Random sample consensus:a paradigm for model fitting with applications to image analysis and automated cartography[J].Communications of the ACM,1981,24(6):381-395. [36] COLEMAN D,SUCAN I,CHITTA S,et al.Reducing the barrier to entry of complex robotic software:a moveit! case study[J].arXiv:1404.3785,2014. [37] ROY A,TODOROVIC S.A multi-scale CNN for affordance segmentation in rgb images[C]//European Conference on Computer Vision.Cham:Springer,2016:186-201. [38] NGUYEN A,KANOULAS D,CALDWELL D G,et al.Detecting object affordances with convolutional neural networks[C]//2016 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS),2016:2765-2770. [39] DAI J,LI Y,HE K,et al.R-FCN:object detection via region-based fully convolutional networks[J].Advances in Neural Information Processing Systems,2016,29. [40] PINHEIRO P O,LIN T Y,COLLOBERT R,et al.Learning to refine object segments[C]//European Conference on Computer Vision.Cham:Springer,2016:75-91. [41] KO K E,SIM K B.Real-time object detection-based affordance feature extraction system using deep learning model[J].Journal of Institute of Control Robotics & Systems,2017,23. [42] ABDALWHAB A,LIU H.Feature fusion one-stage visual affordance detector[C]//2020 7th International Conference on Information,Cybernetics,and Computational Social Systems(ICCSS),2020:102-107. [43] BADRINARAYANAN V,HANDA A,CIPOLLA R.Segnet:a deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling[J].arXiv:1505.07293,2015. [44] WU H,ZHANG Z,CHENG H,et al.Learning affordance space in physical world for vision-based robotic object manipulation[C]//2020 IEEE International Conference on Robotics and Automation(ICRA),2020:4652-4658. [45] RICHARDSON M,DOMINGOS P.Markov logic networks[J].Machine Learning,2006,62(1/2):107-136. [46] ZHU Y,ZHAO Y,ZHU S C.Understanding tools:task-oriented object modeling,learning and recognition[C]//Computer Vision & Pattern Recognition,2015. [47] MOTTAGHI R,SCHENCK C,FOX D,et al.See the glass half full:reasoning about liquid containers,their volume and content[C]//2017 IEEE International Conference on Computer Vision(ICCV),2017. [48] UGUR E,DOGAR M R,CAKMAK M,et al.Curiosity-driven learning of traversability affordance on a mobile robot[C]//IEEE International Conference on Development & Learning,2007. [49] AKGUN B,DAG N,BILAL T,et al.Unsupervised learning of affordance relations on a humanoid robot[C]//2009 24th International Symposium on Computer and Information Sciences,2009:254-259. [50] 吴培良,何犇,侯增广.基于空间金字塔池化特征的日常工具分类识别[J].控制与决策,2019,34(7):1481-1486. WU P L,HE B,HOU Z G.Household tools classification recognition based on spatial pyramid pooling features[J].Control and Decision,2019,34(7):1481-1486. [51] GUPTA A,SATKIN S,EFROS A A,et al.From 3d scene geometry to human workspace[C]//Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition,2011:1961-1968. [52] QI S,HUANG S,WEI P,et al.Predicting human activities using stochastic grammar[C]//Proceedings of the IEEE International Conference on Computer Vision,2017:1164-1172. [53] VERES M,CABRAL I,MOUSSA M.Incorporating object intrinsic features within deep grasp affordance prediction[J].IEEE Robotics and Automation Letters,2020,5(4):6009-6016. [54] ZHAO X,CAO Y,KANG Y.Object affordance detection with relationship-aware network[J].Neural Computing and Applications,2020,32(18):14321-14333. [55] LUO H,ZHAI W,ZHANG J,et al.Learning visual affordance grounding from demonstration videos[J].arXiv:2108. 05675,2021. [56] KEAVENY A.Experimental evaluation of affordance detection applied to 6-DoF pose estimation for intelligent robotic grasping of household objects[D].University of Waterloo,2021. [57] ZHANG L,DU W,ZHOU S,et al.Inpaint2Learn:a self-supervised framework for affordance learning[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision,2022:2665-2674. [58] KOHONEN T.Self-organized maps of sensory events[J].Philosophical Transactions of the Royal Society of London.Series A:Mathematical,Physical and Engineering Sciences,2003,361(1807):1177-1186. [59] KIRA K,RENDELL L A.A practical approach to feature selection[M]//Machine learning proceedings 1992.[S.l.]:Morgan Kaufmann,1992:249-256. [60] KOPPULA H S,GUPTA R,SAXENA A.Learning human activities and object affordances from RGBD videos[J].The International Journal of Robotics Research,2013,32(8):951-970. [61] HASSAN M,DHARMARATNE A.Attribute based affordance detection from human-object interaction images[M]//Image and video technology.Cham:Springer,2015:220-232. [62] STELLA X Y,ZHANG H,MALIK J.Inferring spatial layout from a single image via depth-ordered grouping[C]//2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops,2008:1-7. [63] WANG H,GOULD S,KOLLER D.Discriminative learning with latent variables for cluttered indoor scene understanding[C]//European Conference on Computer Vision.Berlin,Heidelberg:Springer,2010:497-510. [64] LI X,LIU S,KIM K,et al.Putting humans in a scene:learning affordance in 3d indoor environments[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019:12368-12376. [65] SHU T,GAO X,RYOO M S,et al.Learning social affordance grammar from videos:transferring human interactions to human-robot interactions[J].arXiv:1703.00503,2017. [66] CHU F J,XU R,VELA P A.Learning affordance segmentation for real-world robotic manipulation via synthetic images[J].IEEE Robotics and Automation Letters,2019,4(2):1140-1147. [67] MANDIKAL P,GRAUMAN K.Learning dexterous grasping with object-centric visual affordances[C]//2021 IEEE International Conference on Robotics and Automation(ICRA),2021:6169-6176. [68] RAKELLY K,SHELHAMER E,DARRELL T,et al.Few-shot segmentation propagation with guided networks[J].arXiv:1806.07373,2018. [69] LU L,ZHAI W,LUO H,et al.Phrase-based affordance detection via cyclic bilateral interaction[J].arXiv:2202.12076,2022. [70] LUO H,ZHAI W,ZHANG J,et al.One-shot affordance detection[J].arXiv:2106.14747,2021. [71] BAR M.Visual objects in context[J].Nature Reviews Neuroscience,2004,5(8):617-629. [72] OLIVA A,TORRALBA A.The role of context in object recognition[J].Trends in Cognitive Sciences,2007,11(12):520-527. [73] TORRALBA A.Contextual priming for object detection[J].International Journal of Computer Vision,2003,53(2):169-191. [74] SUN Y,REN S,LIN Y.Object-object interaction affordance learning[J].Robotics and Autonomous Systems,2014,62(4):487-496. [75] PIEROPAN A,EK C H,KJELLSTR?M H.Recognizing object affordances in terms of spatio-temporal object-object relationships[C]//2014 IEEE-RAS International Conference on Humanoid Robots,2014:52-58. [76] HU R,VAN KAICK O,WU B,et al.Learning how objects function via co-analysis of interactions[J].ACM Transactions on Graphics(TOG),2016,35(4):1-13. [77] HU R,ZHU C,KAICK O V,et al.Interaction context(ICON):towards a geometric functionality descriptor[J].ACM Transactions on Graphics,2015,34(4):1-12. [78] MO K,QIN Y,XIANG F,et al.O2O-afford:annotation-free large-scale object-object affordance learning[C]//Conference on Robot Learning,2022:1666-1677. [79] LIN Y C,ZENG A,SONG S,et al.Learning to see before learning to act:visual pre-training for manipulation[C]//2020 IEEE International Conference on Robotics and Automation(ICRA),2020:7286-7293. [80] TSAI C Y,LIN H P,CHIU Y C.An ESP-based lightweight model for joint object detection and affordance segmentation[C]//2021 6th Asia-Pacific Conference on Intelligent Robot Systems(ACIRS),2021:89-93. [81] APICELLA T,CAVALLARO A,BERTA R,et al.An affordance detection pipeline for re-source-constrained devices[C]//2021 28th IEEE International Conference on Electronics,Circuits,and Systems(ICECS),2021:1-6. [82] RAGUSA E,GIANOGLIO C,DOSEN S,et al.Hardware-aware affordance detection for application in portable embedded systems[J].IEEE Access,2021,9:123178-123193. [83] NIIN G,VASSILJEV P,RINNE T,et al.Capturing affordances for health and well-being at the city scale[M]//Urban blue spaces:planning and design for water,health and well-being.[S.l.]:Routledge,2021:162-178. [84] KYTT? M.Affordances of children’s environments in the context of cities,small towns,suburbs and rural villages in finland and belarus[J].Journal of Environmental Psychology,2002,22(1/2):109-123. [85] ZHU Y,JIANG C,ZHAO Y,et al.Inferring forces and learning human utilities from videos[C]//Computer Vision & Pattern Recognition,2016. [86] GUO D,LIU H,FANG B,et al.Visual affordance guided tactile material recognition for waste recycling[J].IEEE Transactions on Automation Science and Engineering,2021:1-9. [87] VAN HOUTUM P.Object affordance detection for mobile manipulation in retail environments[D].Netherlands:TU Delft Library,2021. [88] KOPPULA H,SAXENA A.Learning spatio-temporal structure from RGBD videos for human activity detection and anticipation[C]//International Conference on Machine Learning,2013:792-800. [89] VAN DIJK L,RIETVELD E.Situated anticipation[J].Synthese,2021,198(1):349-371. [90] ZHANG Y,HASSAN M,NEUMANN H,et al.Generating 3d people in scenes without people[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020:6194-6204. [91] CORONA E,PUMAROLA A,ALENYA G,et al.Ganhand:predicting human grasp affordances in multi-object scenes[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020:5031-5041. [92] MYERS A,TEO C L,FERMüLLER C,et al.Affordance detection of tool parts from geometric features[C]//2015 IEEE International Conference on Robotics and Automation(ICRA),2015:1374-1381. [93] LIU C,FANG B,SUN F,et al.Learning to grasp familiar objects based on experience and objects’ shape affordance[J].IEEE Transactions on Systems,Man,and Cybernetics:Systems,2019,49(12):2710-2723. [94] AGARWAL T,ARORA H,SCHNEIDER J.Affordance-based reinforcement learning for urban driving[J].arXiv:2101.05970,2021. [95] ZHAI W,LUO H,ZHANG J,et al.One-shot object affordance detection in the wild[J].arXiv:2108.03658,2021. [96] LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:common objects in context[C]//European Conference on Computer Vision.Cham:Springer,2014:740-755. [97] RUSSAKOVSKY O,DENG J,SU H,et al.Imagenet large scale visual recognition challenge[J].International Journal of Computer Vision,2015,115(3):211-252. [98] CALLI B,SINGH A,WALSMAN A,et al.The ycb object and model set:towards common benchmarks for manipulation research[C]//2015 International Conference on Advanced Robotics(ICAR),2015:510-517. [99] TAHERI O,GHORBANI N,BLACK M J,et al.GRAB:a dataset of whole-body human grasping of objects[C]//European Conference on Computer Vision.Cham:Springer,2020:581-600. [100] LIU L,XU W,LU C.FPHA-Afford:a domain-specific benchmark dataset for occluded object affordance estimation in human-object-robot interaction[C]//2020 IEEE International Conference on Image Processing(ICIP),2020:1416-1420. [101] GARCIA-HERNANDO G,YUAN S,BAEK S,et al.First-person hand action benchmark with RGBD videos and 3d hand pose annotations[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2018:409-419. [102] DENG S,XU X,WU C,et al.3D AffordanceNet:a benchmark for visual object affordance understanding[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2021:1778-1787. [103] MO K,ZHU S,CHANG A X,et al.Partnet:a large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019:909-918. [104] XU R,CHU F J,TANG C,et al.An affordance keypoint detection network for robot manipulation[J].IEEE Robotics and Automation Letters,2021,6(2):2870-2877. [105] KOPPULA H S,GUPTA R,SAXENA A.Learning human activities and object affordances from RGBD videos[J].The International Journal of Robotics Research,2013,32(8):951-970. [106] SUNG J,PONCE C,SELMAN B,et al.Unstructured human activity detection from rgbd images[C]//2012 IEEE International Conference on Robotics and Automation,2012:842-849. [107] ZHOU B,ZHAO H,PUIG X,et al.Scene parsing through ade20k dataset[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017:633-641. [108] THERMOS S,PAPADOPOULOS G T,DARAS P,et al.Deep affordance-grounded sensorimotor object recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017:6167-6175. [109] LUNDGREN A V A,SANTOS M A O,BEZERRA B L D,et al.Systematic review of computer vision semantic analysis in socially assistive robotics[J].AI,2022,3(1):229-249. |
[1] | 高广尚. 深度学习推荐模型中的注意力机制研究综述[J]. 计算机工程与应用, 2022, 58(9): 9-18. |
[2] | 吉梦, 何清龙. AdaSVRG:自适应学习率加速SVRG[J]. 计算机工程与应用, 2022, 58(9): 83-90. |
[3] | 罗向龙, 郭凰, 廖聪, 韩静, 王立新. 时空相关的短时交通流宽度学习预测模型[J]. 计算机工程与应用, 2022, 58(9): 181-186. |
[4] | 阿里木·赛买提, 斯拉吉艾合麦提·如则麦麦提, 麦合甫热提, 艾山·吾买尔, 吾守尔·斯拉木, 吐尔根·依不拉音. 神经机器翻译面对句长敏感问题的研究[J]. 计算机工程与应用, 2022, 58(9): 195-200. |
[5] | 陈一潇, 阿里甫·库尔班, 林文龙, 袁旭. 面向拥挤行人检测的CA-YOLOv5[J]. 计算机工程与应用, 2022, 58(9): 238-245. |
[6] | 方义秋, 卢壮, 葛君伟. 联合RMSE损失LSTM-CNN模型的股价预测[J]. 计算机工程与应用, 2022, 58(9): 294-302. |
[7] | 张鑫, 姚庆安, 赵健, 金镇君, 冯云丛. 全卷积神经网络图像语义分割方法综述[J]. 计算机工程与应用, 2022, 58(8): 45-57. |
[8] | 石颉, 袁晨翔, 丁飞, 孔维相. SAR图像建筑物目标检测研究综述[J]. 计算机工程与应用, 2022, 58(8): 58-66. |
[9] | 熊风光, 张鑫, 韩燮, 况立群, 刘欢乐, 贾炅昊. 改进的遥感图像语义分割研究[J]. 计算机工程与应用, 2022, 58(8): 185-190. |
[10] | 杨锦帆, 王晓强, 林浩, 李雷孝, 杨艳艳, 李科岑, 高静. 深度学习中的单阶段车辆检测算法综述[J]. 计算机工程与应用, 2022, 58(7): 55-67. |
[11] | 王斌, 李昕. 融合动态残差的多源域自适应算法研究[J]. 计算机工程与应用, 2022, 58(7): 162-166. |
[12] | 谭暑秋, 汤国放, 涂媛雅, 张建勋, 葛盼杰. 教室监控下学生异常行为检测系统[J]. 计算机工程与应用, 2022, 58(7): 176-184. |
[13] | 周天宇, 朱启兵, 黄敏, 徐晓祥. 基于轻量级卷积神经网络的载波芯片缺陷检测[J]. 计算机工程与应用, 2022, 58(7): 213-219. |
[14] | 张美玉, 刘跃辉, 侯向辉, 秦绪佳. 基于卷积网络的灰度图像自动上色方法[J]. 计算机工程与应用, 2022, 58(7): 229-236. |
[15] | 张壮壮, 屈立成, 李翔, 张明皓, 李昭璐. 基于时空卷积神经网络的数据缺失交通流预测[J]. 计算机工程与应用, 2022, 58(7): 259-265. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||