EISNet: A Multi-Modal Fusion Network for Semantic Segmentation With Events and Images - Details

Author：

Xie, Bochen (Xie, Bochen.) | Deng, Yongjian (Deng, Yongjian.) | Shao, Zhanpeng (Shao, Zhanpeng.) | Li, Youfu (Li, Youfu.)

Indexed by：

EI Scopus SCIE

Abstract：

Bio-inspired　event　cameras　record　a　scene　as　sparse　and　asynchronous　＂events＂　by　detecting　per-pixel　brightness　changes.　Such　cameras　show　great　potential　in　challenging　scene　understanding　tasks,　benefiting　from　the　imaging　advantages　of　high　dynamic　range　and　high　temporal　resolution.　Considering　the　complementarity　between　event　and　standard　cameras,　we　propose　a　multi-modal　fusion　network　(EISNet)　to　improve　the　semantic　segmentation　performance.　The　key　challenges　of　this　topic　lie　in　(i)　how　to　encode　event　data　to　represent　accurate　scene　information　and　(ii)　how　to　fuse　multi-modal　complementary　features　by　considering　the　characteristics　of　two　modalities.　To　solve　the　first　challenge,　we　propose　an　Activity-Aware　Event　Integration　Module　(AEIM)　to　convert　event　data　into　frame-based　representations　with　high-confidence　details　via　scene　activity　modeling.　To　tackle　the　second　challenge,　we　introduce　the　Modality　Recalibration　and　Fusion　Module　(MRFM)　to　recalibrate　modal-specific　representations　and　then　aggregate　multi-modal　features　at　multiple　stages.　MRFM　learns　to　generate　modal-oriented　masks　to　guide　the　merging　of　complementary　features,　achieving　adaptive　fusion.　Based　on　these　two　core　designs,　our　proposed　EISNet　adopts　an　encoder-decoder　transformer　architecture　for　accurate　semantic　segmentation　using　events　and　images.　Experimental　results　show　that　our　model　outperforms　state-of-the-art　methods　by　a　large　margin　on　event-based　semantic　segmentation　datasets.

Keyword：

Noise measurement multi-modal fusion attention mechanism Visualization Semantic segmentation Cameras Event camera Standards Semantics semantic segmentation Task analysis

Author Community：

[ 1 ] [Xie, Bochen]City Univ Hong Kong, Dept Mech Engn, Hong Kong, Peoples R China
[ 2 ] [Li, Youfu]City Univ Hong Kong, Dept Mech Engn, Hong Kong, Peoples R China
[ 3 ] [Deng, Yongjian]Beijing Univ Technol, Coll Comp Sci, Beijing 100124, Peoples R China
[ 4 ] [Shao, Zhanpeng]Hunan Normal Univ, Coll Informat Sci & Engn, Changsha 410081, Peoples R China

Reprint Author's Address：

Email：

boxie4-c@my.cityu.edu.hk |
yjdeng@bjut.edu.cn |
zpshao@hunnu.edu.cn |
meyfli@cityu.edu.hk

Show more details

Related Keywords：

A Parallel-Data-Free Speech Enhancement Method Using Multi-Objective Learning Cycle-Consistent Generative Adversarial Network
2020，IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
UTSN-net: Medical Image Semantic Segmentation Model Based On Skip Non-local Attention Module
2023，8th International Conference on Electronic Technology and Information Science, ICETIS 2023
SAM-Event-Adapter: Adapting Segment Anything Model for Event-RGB Semantic Segmentation
2024，2024 IEEE International Conference on Robotics and Automation, ICRA 2024
Road Scene Segmentation Based on Multi-scale Attention Mechanism
2022，5th IEEE Advanced Information Management, Communicates, Electronic and Automation Control Conference, IMCEC 2022

Source ：

IEEE TRANSACTIONS ON MULTIMEDIA

ISSN： 1520-9210

Year： 2024

Volume： 26

Page： 8639-8650

7 . 3 0 0

JCR@2022

Cited Count：

WoS CC Cited Count： 1

SCOPUS Cited Count： 7

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 4

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to