Swin Transformer-based Image Captioning with Feature Enhancement and Multi-stage Fusion - Details

Author：

Indexed by：

EI Scopus

Abstract：

The　objective　of　image　captioning　involves　empowering　computers　to　autonomously　produce　human-like　sentences　that　depict　a　provided　image.　To　address　the　issues　of　insufficient　accuracy　in　image　feature　extraction　and　underutilization　of　visual　information,　we　propose　a　Swin　Transformer-based　image　captioning　model　with　feature　enhancement　and　multi-stage　fusion.　First,　the　Swin　Transformer　is　employed　in　the　capacity　of　an　encoder　for　the　purpose　of　extracting　image　features,　and　feature　enhancement　is　adopted　to　capture　more　information　about　image　features.　Then,　a　multi-stage　image　and　semantic　fusion　module　is　constructed　to　utilize　the　semantic　information　from　past　time　steps.　Finally,　LSTM　is　used　to　decode　the　semantic　and　image　information　and　generate　captions.　The　proposed　model　achieves　better　results　in　baseline　tests　on　the　public　datasets　Flickr8K　and　Flickr30K.　©　2023　IEEE.

Keyword：

Semantics Long short-term memory Image enhancement Image fusion

Author Community：

[ 1 ] [Liu, Lei]Beijing University of Technology, Faculty of Science, Beijing, China
[ 2 ] [Liu, Lei]Beijing Institute for Scientific and Engineering Computing, Beijing University of Technology, Beijing, China
[ 3 ] [Jiao, Yidi]Beijing University of Technology, Faculty of Science, Beijing, China
[ 4 ] [Li, Xiaoran]Beijing University of Technology, Faculty of Science, Beijing, China
[ 5 ] [Li, Jing]Beijing University of Technology, Faculty of Science, Beijing, China
[ 6 ] [Wang, Haitao]China National Institute of Standardization, Fundamental Standardization, Beijing, China
[ 7 ] [Cao, Xinyu]China National Institute of Standardization, Fundamental Standardization, Beijing, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Research on Ground Cloud Image Segmentation Technology Based on Bilateral Feature Fusion
2022，4th IEEE International Conference on Civil Aviation Safety and Information Technology, ICCASIT 2022
C-LSTM for aspect sentiment classification
2018，IPPTA: Quarterly Journal of Indian Pulp and Paper Technical Association
Hierarchical Multi-layer Transfer Learning Model for Biomedical Question Answering
2018，2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018
An Integration Model Based on Graph Convolutional Network for Text Classification
2020，IEEE ACCESS

Source ：

Year： 2023

Language： English

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count： 1

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 0

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to