Cross-modal fusion encoder via graph neural network for referring image segmentation - Details

Author：

Indexed by：

EI Scopus SCIE

Abstract：

Referring　image　segmentation　identifies　the　object　masks　from　images　with　the　guidance　of　input　natural　language　expressions.　Nowadays,　many　remarkable　cross-modal　decoder　are　devoted　to　this　task.　But　there　are　mainly　two　key　challenges　in　these　models.　One　is　that　these　models　usually　lack　to　extract　fine-grained　boundary　information　and　gradient　information　of　images.　The　other　is　that　these　models　usually　lack　to　explore　language　associations　among　image　pixels.　In　this　work,　a　Multi-scale　Gradient　balanced　Central　Difference　Convolution　(MG-CDC)　and　a　Graph　convolutional　network-based　Language　and　Image　Fusion　(GLIF)　for　cross-modal　encoder,　called　Graph-RefSeg,　are　designed.　Specifically,　in　the　shallow　layer　of　the　encoder,　the　MG-CDC　captures　comprehensive　fine-grained　image　features.　It　could　enhance　the　perception　of　target　boundaries　and　provide　effective　guidance　for　deeper　encoding　layers.　In　each　encoder　layer,　the　GLIF　is　used　for　cross-modal　fusion.　It　could　explore　the　correlation　of　every　pixel　and　its　corresponding　language　vectors　by　a　graph　neural　network.　Since　the　encoder　achieves　robust　cross-modal　alignment　and　context　mining,　a　light-weight　decoder　could　be　used　for　segmentation　prediction.　Extensive　experiments　show　that　the　proposed　Graph-RefSeg　outperforms　the　state-of-the-art　methods　on　three　public　datasets.　Code　and　models　will　be　made　publicly　available　at　.　In　this　work,　we　design　a　Multi-scale　Gradient　balanced　Central　Difference　Convolution　(MG-CDC)　and　a　Graph　convolutional　network-based　Language　and　Image　Fusion　(GLIF)　for　cross-modal　encoder.　Since　our　encoder　achieves　robust　cross-modal　alignment　and　context　mining,　we　could　use　a　light-weight　decoder　for　segmentation　prediction.　Extensive　experiments　show　that　our　method　outperforms　the　state-of-the-art　methods　on　three　public　datasets.image

Keyword：

image fusion image segmentation

Author Community：

[ 1 ] [Zhang, Yuqing]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing, Peoples R China
[ 2 ] [Zhang, Yong]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing, Peoples R China
[ 3 ] [Piao, Xinglin]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing, Peoples R China
[ 4 ] [Hu, Yongli]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing, Peoples R China
[ 5 ] [Yin, Baocai]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing, Peoples R China
[ 6 ] [Yuan, Peng]China Elect Technol Grp Taiji Co Ltd, Beijing, Peoples R China
[ 7 ] [Zhang, Yong]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China

Reprint Author's Address：

张勇
[Zhang, Yong]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China

Email：

zhangyong2010@bjut.edu.cn

Show more details

Related Keywords：

Fusion of Multi-Sensor Images Based on PCA and Self-Adaptive Regional Variance Estimation
2012，IEEE Workshop on Signal Processing Systems (SiPS)
A Novel VHR Image change detection algorithm based on fuzzy clustering and image fusion
2017，7th International Conference on Computer Communication and Informatics (ICCCI)
Image Denoising and Fusion Based on Matrix Completion
2013，9th International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP)
Study on Clear Stereo Image Pair Acquisition Method for Small Objects with Big Vertical Size in SLM Vision System
2016，MICROSCOPY RESEARCH AND TECHNIQUE

Source ：

IET IMAGE PROCESSING

ISSN： 1751-9659

Year： 2024

Issue： 4

Volume： 18

Page： 1083-1095

2 . 3 0 0

JCR@2022

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count： 1

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 1

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to