Cross-modal alignment with graph reasoning for image-text retrieval - Details

Author：

Cui, Zheng (Cui, Zheng.) | Hu, Yongli (Hu, Yongli.) | Sun, Yanfeng (Sun, Yanfeng.) | Gao, Junbin (Gao, Junbin.) | Yin, Baocai (Yin, Baocai.) (Scholars：尹宝才)

Indexed by：

EI Scopus SCIE

Abstract：

Image-text　retrieval　task　has　received　a　lot　of　attention　in　the　modern　research　field　of　artificial　intelligence.　It　still　remains　challenging　since　image　and　text　are　heterogeneous　cross-modal　data.　The　key　issue　of　image-text　retrieval　is　how　to　learn　a　common　feature　space　while　semantic　correspondence　between　image　and　text　remains.　Existing　works　cannot　gain　fine　cross-modal　feature　representation　because　the　semantic　relation　between　local　features　is　not　effectively　utilized　and　the　noise　information　is　not　suppressed.　In　order　to　address　these　issues,　we　propose　a　Cross-modal　Alignment　with　Graph　Reasoning　(CAGR)　model,　in　which　the　refined　cross-modal　features　in　the　common　feature　space　are　learned　and　then　a　fine-grained　cross-modal　alignment　method　is　implemented.　Specifically,　we　introduce　a　graph　reasoning　module　to　explore　semantic　connection　for　local　elements　in　each　modality　and　measure　their　importance　by　self-attention　mechanism.　In　a　multi-step　reasoning　manner,　the　visual　semantic　graph　and　textual　semantic　graph　can　be　effectively　learned　and　the　refined　visual　and　textual　features　can　be　obtained.　Finally,　to　measure　the　similarity　between　image　and　text,　a　novel　alignment　approach　named　cross-modal　attentional　fine-grained　alignment　is　used　to　compute　similarity　score　between　two　sets　of　features.　Our　model　achieves　the　competitive　performance　compared　with　the　state-of-the-art　methods　on　Flickr30K　dataset　and　MS-COCO　dataset.　Extensive　experiments　demonstrate　the　effectiveness　of　our　model.

Keyword：

Self-attention mechanism Fine-grained alignment Image-text retrieval Multi-step graph reasoning

Author Community：

[ 1 ] [Cui, Zheng]Beijing Univ Technol, Fac Informat Technol, Beijing Inst Artificial Intelligence, Beijing Key Lab Multimedia & Intelligent Software, Beijing, Peoples R China
[ 2 ] [Hu, Yongli]Beijing Univ Technol, Fac Informat Technol, Beijing Inst Artificial Intelligence, Beijing Key Lab Multimedia & Intelligent Software, Beijing, Peoples R China
[ 3 ] [Sun, Yanfeng]Beijing Univ Technol, Fac Informat Technol, Beijing Inst Artificial Intelligence, Beijing Key Lab Multimedia & Intelligent Software, Beijing, Peoples R China
[ 4 ] [Yin, Baocai]Beijing Univ Technol, Fac Informat Technol, Beijing Inst Artificial Intelligence, Beijing Key Lab Multimedia & Intelligent Software, Beijing, Peoples R China
[ 5 ] [Gao, Junbin]Univ Sydney, Univ Sydney Business Sch, Discipline Business Analyt, Sydney, NSW, Australia

Reprint Author's Address：

Email：

CuiZ@emails.bjut.edu.cn |
huyongli@bjut.edu.cn |
yfsun@bjut.edu.cn |
junbin.gao@sydney.edu.au |
ybc@bjut.edu.cn

Show more details

Related Keywords：

SAGSleepNet: A deep learning model for sleep staging based on self-attention graph of polysomnography
2023，BIOMEDICAL SIGNAL PROCESSING AND CONTROL
Context-aware relation enhancement and similarity reasoning for image-text retrieval
2024，IET COMPUTER VISION
CGNN: Caption-assisted graph neural network for image-text retrieval
2022，PATTERN RECOGNITION LETTERS
Common-Memory Bridged Cross-Modal Adaptive Graph Embedding for Image-Text Retrieval
2024，2024 IEEE International Conference on Multimedia and Expo, ICME 2024

Source ：

MULTIMEDIA TOOLS AND APPLICATIONS

ISSN： 1380-7501

Year： 2022

Issue： 17

Volume： 81

Page： 23615-23632

3 . 6

JCR@2022

3 . 6 0 0

JCR@2022

ESI Discipline： COMPUTER SCIENCE;

ESI HC Threshold：46

JCR Journal Grade：2

CAS Journal Grade：4

Cited Count：

WoS CC Cited Count： 2

SCOPUS Cited Count： 4

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 1

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to