• 综合
  • 标题
  • 关键词
  • 摘要
  • 学者
  • 期刊-刊名
  • 期刊-ISSN
  • 会议名称


Duan, Li-Juan (Duan, Li-Juan.) (学者:段立娟) | Sun, Qi-Chao (Sun, Qi-Chao.) | Qiao, Yuan-Hua (Qiao, Yuan-Hua.) (学者:乔元华) | Chen, Jun-Cheng (Chen, Jun-Cheng.) | Cui, Guo-Qin (Cui, Guo-Qin.)


EI Scopus CSCD


Semantic segmentation is a research hotspot in the field of computer vision. It refers to assigning all pixels into different semantic classes. As a fundamental problem in scene understanding, semantic segmentation is widely used in various intelligent tasks. In recent years, with the success of convolutional neural network (CNN) in many computer vision applications, fully convolutional networks (FCN) have shown great potential on RGB semantic segmentation task. However, semantic segmentation is still a challenging task due to the complexity of scene types, severe object occlusions and varying illuminations. In recent years, with the availability of consumer RGB-D sensors such as RealSense 3D Camera and Microsoft Kinect, we can capture both RGB image and depth information at the same time. Depth information can describe 3D geometric information which might be missed in RGB-only images. It can significantly reduce classification errors and improve the accuracy of semantic segmentation. In order to make effective use of RGB information and depth information, it is crucial to find an efficient multi-modal information fusion method. According to different fusion periods, the current RGB-D feature fusion methods can be divided into three types: early fusion, late fusion and middle fusion. However, most of previous studies fail to make effective use of complementary information between RGB information and depth information. They simply fuse RGB features and depth features with equal-weight concatenating or summing, which failed to extract complementary information between two modals and will suppressed the modality specific information. In addition, semantic information in high level features between different modals is not taken into account, which is very important for the fine-grained semantic segmentation task. To solve the above problems, in this paper, we present a novel Attention-aware and Semantic-aware Multi-modal Fusion Network (ASNet) for RGB-D semantic segmentation. Our network is able to effectively fuse multi-level RGB-D features by including Attention-aware Multi-modal Fusion blocks(AMF) and Semantic-aware Multi-modal Fusion blocks(SMF). Specifically, in Attention-aware Multi-modal Fusion blocks, a cross-modal attention mechanism is designed to make RGB features and depth features guide and optimize each other through their complementary characteristics in order to obtain the feature representation with rich spatial location information. In addition, Semantic-aware Multi-modal Fusion blocks model the semantic interdependencies between multi-modal features by integrating semantic associated feature channels among the RGB and depth features and extract more precise semantic feature representation. The two blocks are integrated into a two-branch encoder-decoder architecture, which can restore image resolution gradually by using consecutive up-sampling operation and combine low level features and high level features through skip-connections to achieve high-resolution prediction. In order to optimize the training process, we using deeply supervised learning over multi-level decoding features. Our network is able to effectively learn the complementary characteristics of two modalities and models the semantic context interdependencies between RGB features and depth features. Experimental results with two challenging public RGB-D indoor semantic segmentation datasets, i.e., SUN RGB-D and NYU Depth v2, show that our network outperforms existing RGB-D semantic segmentation methods and improves the segmentation performance by 1.9% and 1.2% for mean accuracy and mean IoU respectively. © 2021, Science Press. All right reserved.


Semantics Convolutional neural networks Image resolution Convolution Computer vision Decoding Cameras Semantic Web


  • [ 1 ] [Duan, Li-Juan]Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China
  • [ 2 ] [Duan, Li-Juan]Beijing Key Laboratory of Trusted Computing, Beijing; 100124, China
  • [ 3 ] [Duan, Li-Juan]National Engineering Laboratory for Key Technologies of Information Security Level Protection, Beijing; 100124, China
  • [ 4 ] [Sun, Qi-Chao]Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China
  • [ 5 ] [Sun, Qi-Chao]Beijing Key Laboratory of Trusted Computing, Beijing; 100124, China
  • [ 6 ] [Sun, Qi-Chao]Advanced Institute of Information Technology, Peking University, Hangzhou; 311200, China
  • [ 7 ] [Qiao, Yuan-Hua]College of Applied Sciences, Beijing University of Technology, Beijing; 100124, China
  • [ 8 ] [Chen, Jun-Cheng]Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China
  • [ 9 ] [Cui, Guo-Qin]State Key Laboratory of Digital Multi-media Chip Technology, Vimicro Corporation, Beijing; 100191, China


  • [chen, jun-cheng]faculty of information technology, beijing university of technology, beijing; 100124, china





来源 :

Chinese Journal of Computers

ISSN: 0254-4164

年份: 2021

期: 2

卷: 44

页码: 275-291



SCOPUS被引频次: 4

ESI高被引论文在榜: 0 展开所有



近30日浏览量: 0

地址:北京工业大学图书馆(北京市朝阳区平乐园100号 邮编:100124) 联系我们:010-67392185
版权所有:北京工业大学图书馆 站点建设与维护:北京爱琴海乐之技术有限公司