CGNN: Caption-assisted graph neural network for image-text retrieval - Details

Author：

Hu, Yongli (Hu, Yongli.) | Zhang, Hanfu (Zhang, Hanfu.) | Jiang, Huajie (Jiang, Huajie.) | Bi, Yandong (Bi, Yandong.) | Yin, Baocai (Yin, Baocai.) (Scholars：尹宝才)

Indexed by：

EI Scopus SCIE

Abstract：

Image-text　retrieval　has　drawn　much　attention　in　recent　years,　where　similarity　measure　between　im-age　and　text　plays　an　important　role.　Most　existing　works　focus　on　learning　global　coarse-grained　or　local　fine-grained　features　for　similarity　computation.　However,　the　large　domain　gap　between　different　modalities　is　often　neglected,　which　makes　it　difficult　to　match　the　images　and　texts　effectively.　In　order　to　deal　with　this　problem,　we　propose　to　use　auxiliary　information　to　release　the　domain　gap,　where　the　image　captions　are　generated.　Then,　a　Caption-Assisted　Graph　Neural　Network(CGNN)　is　designed　to　learn　the　structured　relationships　among　images,　captions,　and　texts.　Since　the　captions　and　the　texts　are　from　the　same　domain,　the　domain　gap　between　images　and　texts　can　be　effectively　released.　With　the　help　of　caption　information,　our　model　achieves　excellent　performance　on　two　cross-modal　retrieval　datasets,　Flickr30K　and　MS-COCO,　which　shows　the　effectiveness　of　our　framework.(c)　2022　Elsevier　B.V.　All　rights　reserved.

Keyword：

Cross -modal retrieval Image captioning Image -text retrieval Graph convolution

Author Community：

[ 1 ] [Hu, Yongli]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[ 2 ] [Zhang, Hanfu]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[ 3 ] [Jiang, Huajie]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[ 4 ] [Bi, Yandong]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[ 5 ] [Yin, Baocai]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[ 6 ] [Jiang, Huajie]Beijing Univ Technol, Beijing 100124, Peoples R China

Reprint Author's Address：

Email：

huyongli@bjut.edu.cn |
zhanghf@emails.bjut.edu.cn |
jianghj@bjut.edu.cn |
biyandong@emails.bjut.edu.cn |
ybc@bjut.edu.cn

Show more details

Related Keywords：

Spatio-Temporal Memory Attention for Image Captioning
2020，IEEE TRANSACTIONS ON IMAGE PROCESSING
Cross-modal alignment with graph reasoning for image-text retrieval
2022，MULTIMEDIA TOOLS AND APPLICATIONS
Common-Memory Bridged Cross-Modal Adaptive Graph Embedding for Image-Text Retrieval
2024，2024 IEEE International Conference on Multimedia and Expo, ICME 2024
Context-aware relation enhancement and similarity reasoning for image-text retrieval
2024，IET COMPUTER VISION

Source ：

PATTERN RECOGNITION LETTERS

ISSN： 0167-8655

Year： 2022

Volume： 161

Page： 137-142

5 . 1

JCR@2022

5 . 1 0 0

JCR@2022

ESI Discipline： ENGINEERING;

ESI HC Threshold：49

JCR Journal Grade：2

CAS Journal Grade：3

Cited Count：

WoS CC Cited Count： 3

SCOPUS Cited Count： 4

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 1

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to