A Visual Affordance Reasoning Network Based on Graph Attention - Details

Author：

Indexed by：

EI Scopus

Abstract：

Visual　affordance　studies　what　kind　of　interaction　is　possible　and　whether　the　interaction　is　reasonable　in　the　current　environment　from　an　image/video.　When　inferring　affordances　of　objects,　semantics　and　relations　of　objects　in　the　environment　should　be　considered,　and　graph　is　usually　used　for　modeling　the　environment　context　for　object.　Considering　the　weight　of　edge　in　graph　describes　the　amount　of　contributed　information　between　objects　during　affordance　reasoning,　this　paper　proposes　VAR-Net　(Visual　Affordance　Reasoning　Network)　which　models　the　weights　as　graph　attention　coefficients　and　learns　the　weights　based　on　objects＇　semantic　and　visual　features　implying　their　affordances.　VAR-Net　achieves　higher　accuracy　on　COCO-Tasks　and　ADE-Affordance　datasets.　Experiments　also　explain　the　meaning　of　edge　weights　in　VAR-Net.　For　a　definite　affordance,　an　object　commits　it　more,　the　edges　linking　from　it　to　other　objects　have　larger　weights　and　vice　versa,　which　makes　objects＇　features　distinguishable　for　inferring　affordances.　©　2022　IEEE.

Keyword：

Computer vision Value engineering Deep learning Semantics

Author Community：

[ 1 ] [Xin, Jianjia]Beijing University of Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing, China
[ 2 ] [Wang, Lichun]Beijing University of Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing, China
[ 3 ] [Wang, Shaofan]Beijing University of Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing, China
[ 4 ] [Kong, Dehui]Beijing University of Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing, China
[ 5 ] [Li, Jinghua]Beijing University of Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing, China
[ 6 ] [Yin, Baocai]Beijing University of Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

TCBFormer: A General Architecture Based on Dual-Branch Feature Fusion for Polyp Segmentation
2024，5th International Seminar on Artificial Intelligence, Networking and Information Technology, AINIT 2024
SG-DDPM: Semantic-Guided Diffusion Model for Low-Light Image Enhancement
2024，4th International Conference on Consumer Electronics and Computer Engineering, ICCECE 2024
A siamese pedestrian alignment network for person re-identification
2019，2nd Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2019
Review of vision-based simultaneous localization and mapping
2019，3rd IEEE Information Technology, Networking, Electronic and Automation Control Conference, ITNEC 2019

Source ：

Year： 2022

Page： 283-290

Language： English

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 3

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to