Self-supervised knowledge distillation in counterfactual learning for VQA - Details

Author：

Bi, Yandong (Bi, Yandong.) | Jiang, Huajie (Jiang, Huajie.) | Zhang, Hanfu (Zhang, Hanfu.) | Hu, Yongli (Hu, Yongli.) | Yin, Baocai (Yin, Baocai.) (Scholars：尹宝才)

Indexed by：

EI Scopus SCIE

Abstract：

As　a　popular　cross-modal　reasoning　task,　Visual　Question　Answering　(VQA)　has　achieved　great　progress　in　recent　years.　However,　the　issue　of　language　bias　has　always　affected　the　reliability　of　VQA　models.　To　address　this　problem,　counterfactual　learning　methods　are　proposed　to　learn　more　robust　features　to　mitigate　the　bias　problem.　However,　current　counterfactual　learning　approaches　mainly　focus　on　generating　synthesized　samples　and　assigning　answers　to　them,　neglecting　the　relationship　between　factual　and　original　data,　which　hinders　robust　feature　learning　for　effective　reasoning.　To　overcome　this　limitation,　we　propose　a　Self-supervised　Knowledge　Distillation　approach　in　Counterfactual　Learning　for　VQA,　dubbed　as　VQA-SkdCL,　which　utilizes　a　self-supervised　constraint　to　make　good　use　of　the　hidden　knowledge　in　the　factual　samples,　enhancing　the　robustness　of　VQA　models.　We　demonstrate　the　effectiveness　of　the　proposed　approach　on　VQA　v2,　VQA-CP　v1,　and　VQA-CP　v2　datasets　and　our　approach　achieves　excellent　performance.

Keyword：

Counterfactual learning Language bias Visual question answering Self-supervised learning

Author Community：

[ 1 ] [Bi, Yandong]Beijing Univ Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing 100124, Peoples R China
[ 2 ] [Jiang, Huajie]Beijing Univ Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing 100124, Peoples R China
[ 3 ] [Zhang, Hanfu]Beijing Univ Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing 100124, Peoples R China
[ 4 ] [Hu, Yongli]Beijing Univ Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing 100124, Peoples R China
[ 5 ] [Yin, Baocai]Beijing Univ Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing 100124, Peoples R China
[ 6 ] [Jiang, Huajie]Beijing Univ Technol, Beijing 100124, Peoples R China

Reprint Author's Address：

[Jiang, Huajie]Beijing Univ Technol, Beijing 100124, Peoples R China;;

Email：

biyandong@emails.bjut.edu.cn |
jianghj@bjut.edu.cn |
zhanghf@emails.bjut.edu.cn |
huyongli@bjut.edu.cn |
ybc@bjut.edu.cn

Show more details

Related Keywords：

VQA-PDF: Purifying Debiased Features for Robust Visual Question Answering Task
2024，ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024
See and Learn More: Dense Caption-Aware Representation for Visual Question Answering
2024，IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
Fair Attention Network for Robust Visual Question Answering
2024，IEEE Transactions on Circuits and Systems for Video Technology
Contrastive Visual-Question-Caption Counterfactuals on Biased Samples for Visual Question Answering
2024，43rd Chinese Control Conference, CCC 2024

Source ：

PATTERN RECOGNITION LETTERS

ISSN： 0167-8655

Year： 2023

Volume： 177

Page： 33-39

5 . 1 0 0

JCR@2022

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count： 4

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 1

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to