收录:
摘要:
Visual affordance studies what kind of interaction is possible and whether the interaction is reasonable in the current environment from an image/video. When inferring affordances of objects, semantics and relations of objects in the environment should be considered, and graph is usually used for modeling the environment context for object. Considering the weight of edge in graph describes the amount of contributed information between objects during affordance reasoning, this paper proposes VAR-Net (Visual Affordance Reasoning Network) which models the weights as graph attention coefficients and learns the weights based on objects' semantic and visual features implying their affordances. VAR-Net achieves higher accuracy on COCO-Tasks and ADE-Affordance datasets. Experiments also explain the meaning of edge weights in VAR-Net. For a definite affordance, an object commits it more, the edges linking from it to other objects have larger weights and vice versa, which makes objects' features distinguishable for inferring affordances. © 2022 IEEE.
关键词:
通讯作者信息:
电子邮件地址:
来源 :
年份: 2022
页码: 283-290
语种: 英文
归属院系: