• 综合
  • 标题
  • 关键词
  • 摘要
  • 学者
  • 期刊-刊名
  • 期刊-ISSN
  • 会议名称
搜索

作者:

Bi, Yandong (Bi, Yandong.) | Jiang, Huajie (Jiang, Huajie.) | Hu, Yongli (Hu, Yongli.) | Sun, Yanfeng (Sun, Yanfeng.) | Yin, Baocai (Yin, Baocai.)

收录:

EI Scopus

摘要:

As a prevailing cross-modal reasoning task, Visual Question Answering (VQA) has achieved impressive progress in the last few years, where the language bias is widely studied to learn more robust VQA models. However, the visual bias, which also influences the robustness of VQA models, is seldomly considered, resulting in weak inference ability. Therefore, how to balance the effect of language bias and visual bias has become essential in the current VQA task. In this paper, we devise a new reweighting strategy taking both the language bias and visual bias into account, and propose a Fair Attention Network for Robust Visual Question Answering (named as FAN-VQA). It first constructs a question bias branch and a visual bias branch to estimate the bias information from two modalities, which are utilized to judge the importance of samples. Then, adaptive importance weights are learned from the bias information and assigned to the candidate answers to adjust the training losses, enabling the model to shift more attention to the difficult samples that need less-salient visual clues to infer the correct answer. In order to improve the robustness of the VQA model, we design a progressive strategy to balance the influence of original training loss and adjusted training loss. Extensive experiments on the VQA-CP v2, VQA v2, and VQA-CE datasets demonstrate the effectiveness of the proposed FAN-VQA method. © 1991-2012 IEEE.

关键词:

Visual languages Job analysis

作者机构:

  • [ 1 ] [Bi, Yandong]Beijing Institute of Artificial Intelligence, Faculty of Information Technology, Beijing University of Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing; 100124, China
  • [ 2 ] [Jiang, Huajie]Beijing Institute of Artificial Intelligence, Faculty of Information Technology, Beijing University of Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing; 100124, China
  • [ 3 ] [Hu, Yongli]Beijing Institute of Artificial Intelligence, Faculty of Information Technology, Beijing University of Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing; 100124, China
  • [ 4 ] [Sun, Yanfeng]Beijing Institute of Artificial Intelligence, Faculty of Information Technology, Beijing University of Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing; 100124, China
  • [ 5 ] [Yin, Baocai]Beijing Institute of Artificial Intelligence, Faculty of Information Technology, Beijing University of Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing; 100124, China

通讯作者信息:

电子邮件地址:

查看成果更多字段

相关关键词:

相关文章:

来源 :

IEEE Transactions on Circuits and Systems for Video Technology

ISSN: 1051-8215

年份: 2024

期: 9

卷: 34

页码: 7870-7881

8 . 4 0 0

JCR@2022

被引次数:

WoS核心集被引频次:

SCOPUS被引频次: 6

ESI高被引论文在榜: 0 展开所有

万方被引频次:

中文被引频次:

近30日浏览量: 1

归属院系:

在线人数/总访问数:575/4948458
地址:北京工业大学图书馆(北京市朝阳区平乐园100号 邮编:100124) 联系我们:010-67392185
版权所有:北京工业大学图书馆 站点建设与维护:北京爱琴海乐之技术有限公司