• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Chen, Dongpan (Chen, Dongpan.) | Kong, Dehui (Kong, Dehui.) (Scholars:孔德慧) | Li, Jinghua (Li, Jinghua.) | Wang, Lichun (Wang, Lichun.) | Gao, Junna (Gao, Junna.) | Yin, Baocai (Yin, Baocai.)

Indexed by:

EI Scopus SCIE

Abstract:

Traditional affordance learning tasks aim to understand object's interactive functions in an image, such as affordance recognition and affordance detection. However, these tasks cannot determine whether the object is currently interacting, which is crucial for many follow-up tasks, including robotic manipulation and planning task. To fill this gap, this paper proposes a novel object affrodance state (OAS) recognition task, i.e., simultaneously recognizing an object's affordances and the partner objects that are interacting with it. Accordingly, to facilitate the application of deep learning technology, an OAS recognition task related dataset OAS10k is constructed by collecting and labeling over 10k images. In the dataset, a sample is defined as a set of an image and its OAS labels, each label is represented as $\left \langle{ \rm {\textit {subject, subject's affrodance, interacted object}} }\right \rangle $ . These triplet labels have rich relational semantic information, which can improve OAS recognition performance. We hence construct a directed OAS knowledge graph of affordance states, and extract an OAS matrix from it for modelling the semantic relationships of the triplets. Based on the matrix, we propose an OAS recognition network (OASNet), which utilizes GCN to capture the relational semantic embeddings, and uses a transformer to fuse them with the visual features from an image to recognize the affordance states of objects in the image. Experimental results on OAS10k dataset and other triplet label recognition datasets demonstrate that the proposed OASNet achieves the best performance compared to the state-of-the-art methods. The dataset and codes will be released on https://github.com/mxmdpc/OAS.

Keyword:

Object affordance state recognition transformer multi-label image classification relational semantic embeddings graph convolution networks

Author Community:

  • [ 1 ] [Chen, Dongpan]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
  • [ 2 ] [Kong, Dehui]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
  • [ 3 ] [Li, Jinghua]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
  • [ 4 ] [Wang, Lichun]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
  • [ 5 ] [Gao, Junna]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
  • [ 6 ] [Yin, Baocai]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China

Reprint Author's Address:

  • [Kong, Dehui]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China;;

Show more details

Related Keywords:

Related Article:

Source :

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

ISSN: 1051-8215

Year: 2024

Issue: 5

Volume: 34

Page: 3368-3382

8 . 4 0 0

JCR@2022

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count: 1

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 1

Affiliated Colleges:

Online/Total:742/5309494
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.