• 综合
  • 标题
  • 关键词
  • 摘要
  • 学者
  • 期刊-刊名
  • 期刊-ISSN
  • 会议名称
搜索

作者:

Sun, Zhongfan (Sun, Zhongfan.) | Hu, Yongli (Hu, Yongli.) | Gao, Qingqing (Gao, Qingqing.) | Jiang, Huajie (Jiang, Huajie.) | Gao, Junbin (Gao, Junbin.) | Sun, Yanfeng (Sun, Yanfeng.) | Yin, Baocai (Yin, Baocai.) (学者:尹宝才)

收录:

CPCI-S EI Scopus

摘要:

Considerable performance gains have been achieved for knowledge-based visual question answering due to the visual-language pretraining models with pre-training-then-fine-tuning paradigm. However, because the targets of the pre-training and fine-tuning stages are different, there is an evident barrier that prevents the cross-modal comprehension ability developed in the pre-training stage from fully endowing the fine-tuning task. To break this barrier, in this paper, we propose a novel hybrid prompting model for knowledge-based VQA, which inherits and incorporates the pre-training and fine-tuning tasks with a shared objective. Specifically, based on static declaration prompt, we construct a consistent goal with the fine-tuning via masked language modeling to inherit capabilities of pre-training task, while selecting the top-t relevant knowledge in a dense retrieval manner. Additionally, a dynamic knowledge prompt is learned from retrieved knowledge, which not only alleviates the length constraint on inputs for visual-language pre-trained models but also assists in providing answer features via fine-tuning. Combining and unifying the aims of the two stages could fully exploit the abilities of pre-training and fine-tuning to predict answer. We evaluate the proposed model on the OKVQA dataset, and the result shows that our model outperforms the state-of-the-art methods based on visual-language pre-training models with a noticeable performance gap and even exceeds the largescale language model of GPT-3, which proves the benefits of the hybrid prompts and the advantages of unifying pre-training to fine-tuning.

关键词:

Knowledge Integration Visual Question Answering Multi-modal Fusion

作者机构:

  • [ 1 ] [Sun, Zhongfan]Beijing Univ Technol, Beijing, Peoples R China
  • [ 2 ] [Hu, Yongli]Beijing Univ Technol, Beijing, Peoples R China
  • [ 3 ] [Gao, Qingqing]Beijing Univ Technol, Beijing, Peoples R China
  • [ 4 ] [Jiang, Huajie]Beijing Univ Technol, Beijing, Peoples R China
  • [ 5 ] [Sun, Yanfeng]Beijing Univ Technol, Beijing, Peoples R China
  • [ 6 ] [Yin, Baocai]Beijing Univ Technol, Beijing, Peoples R China
  • [ 7 ] [Gao, Junbin]Univ Sydney, Sydney, NSW, Australia

通讯作者信息:

查看成果更多字段

相关关键词:

相关文章:

来源 :

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023

年份: 2023

页码: 4065-4073

被引次数:

WoS核心集被引频次:

SCOPUS被引频次: 5

ESI高被引论文在榜: 0 展开所有

万方被引频次:

中文被引频次:

近30日浏览量: 1

归属院系:

在线人数/总访问数:462/4940775
地址:北京工业大学图书馆(北京市朝阳区平乐园100号 邮编:100124) 联系我们:010-67392185
版权所有:北京工业大学图书馆 站点建设与维护:北京爱琴海乐之技术有限公司