Spatio-Temporal Memory Attention for Image Captioning - Details

Author：

Ji, Junzhong (Ji, Junzhong.) (Scholars：冀俊忠) | Xu, Cheng (Xu, Cheng.) | Zhang, Xiaodan (Zhang, Xiaodan.) | Wang, Boyue (Wang, Boyue.) | Song, Xinhang (Song, Xinhang.)

Indexed by：

SSCI EI Scopus SCIE

Abstract：

Visual　attention　has　been　successfully　applied　in　image　captioning　to　selectively　incorporate　the　most　relevant　areas　to　the　language　generation　procedure.　However,　the　attention　in　current　image　captioning　methods　is　only　guided　by　the　hidden　state　of　language　model,　e.g.　LSTM　(Long-Short　Term　Memory),　indirectly　and　implicitly,　and　thus　the　attended　areas　are　weakly　relevant　at　different　time　steps.　Besides　the　spatial　relationship　of　attention　areas,　the　temporal　relationship　in　attention　is　crucial　for　image　captioning　according　to　the　attention　transmission　mechanism　of　human　vision.　In　this　paper,　we　propose　a　new　spatio-temporal　memory　attention　(STMA)　model　to　learn　the　spatio-temporal　relationship　in　attention　for　image　captioning.　The　STMA　introduces　the　memory　mechanism　to　the　attention　model　through　a　tailored　LSTM,　where　the　new　cell　is　used　to　memorize　and　propagate　the　attention　information,　and　the　output　gate　is　used　to　generate　attention　weights.　The　attention　in　STMA　transmits　with　memory　adaptively　and　dependently,　which　builds　strong　temporal　connections　of　attentions　and　learns　the　spatio-temporal　relationship　of　attended　areas　simultaneously.　Besides,　the　proposed　STMA　is　flexible　to　combine　with　attention-based　image　captioning　frameworks.　Experiments　on　MS　COCO　dataset　demonstrate　the　superiority　of　the　proposed　STMA　model　in　exploring　the　spatio-temporal　relationship　in　attention　and　improving　the　current　attention-based　image　captioning.

Keyword：

LSTM memory attention attention transmission Image captioning spatio-temporal relationship

Author Community：

[ 1 ] [Ji, Junzhong]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Beijing 100124, Peoples R China
[ 2 ] [Xu, Cheng]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Beijing 100124, Peoples R China
[ 3 ] [Zhang, Xiaodan]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Beijing 100124, Peoples R China
[ 4 ] [Wang, Boyue]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Beijing 100124, Peoples R China
[ 5 ] [Ji, Junzhong]Beijing Univ Technol, Fac Informat Technol, Beijing Municipal Key Lab Multimedia & Intelligen, Beijing 100124, Peoples R China
[ 6 ] [Xu, Cheng]Beijing Univ Technol, Fac Informat Technol, Beijing Municipal Key Lab Multimedia & Intelligen, Beijing 100124, Peoples R China
[ 7 ] [Zhang, Xiaodan]Beijing Univ Technol, Fac Informat Technol, Beijing Municipal Key Lab Multimedia & Intelligen, Beijing 100124, Peoples R China
[ 8 ] [Wang, Boyue]Beijing Univ Technol, Fac Informat Technol, Beijing Municipal Key Lab Multimedia & Intelligen, Beijing 100124, Peoples R China
[ 9 ] [Song, Xinhang]Chinese Acad Sci, Inst Comp Technol, CAS, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China

Reprint Author's Address：

[Zhang, Xiaodan]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Beijing 100124, Peoples R China

Email：

jjz01@bjut.edu.cn |
xucheng2017@emails.bjut.edu.cn |
zhangx-iaodan@bjut.edu.cn |
wby@bjut.edu.cn |
xinhang.song@ict.ac.cn

Show more details

Related Keywords：

Swin-Caption: Swin Transformer-Based Image Captioning with Feature Enhancement and Multi-Stage Fusion
2024，International Journal of Computational Intelligence and Applications
LSTM-assisted evolutionary self-expressive subspace clustering
2021，INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS
STAN Based PM2.5 Prediction Model
2019，Chinese Automation Congress (CAC)
Learning Sentences Similarity By Multi-Head Attention
2018，6th IEEE International Conference on Network Infrastructure and Digital Content (IEEE IC-NIDC)

Source ：

IEEE TRANSACTIONS ON IMAGE PROCESSING

ISSN： 1057-7149

Year： 2020

Volume： 29

Page： 7615-7628

1 0 . 6 0 0

JCR@2022

ESI Discipline： ENGINEERING;

ESI HC Threshold：115

Cited Count：

WoS CC Cited Count： 60

SCOPUS Cited Count： 77

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 4

Affiliated Colleges：

城市建设学部建筑工程学院

城市建设学部城市交通学院

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to