• 综合
  • 标题
  • 关键词
  • 摘要
  • 学者
  • 期刊-刊名
  • 期刊-ISSN
  • 会议名称
搜索

作者:

Yao, Bowen (Yao, Bowen.) | Deng, Yongjian (Deng, Yongjian.) | Liu, Yuhan (Liu, Yuhan.) | Chen, Hao (Chen, Hao.) | Li, Youfu (Li, Youfu.) | Yang, Zhen (Yang, Zhen.)

收录:

EI Scopus

摘要:

Semantic segmentation, a fundamental visual task ubiquitously employed in sectors ranging from transportation and robotics to healthcare, has always captivated the research community. In the wake of rapid advancements in large model research, the foundation model for semantic segmentation tasks, termed the Segment Anything Model (SAM), has been introduced. This model substantially addresses the dilemma of poor generalizability of previous segmentation models and the disadvantage in requiring to retrain the whole model on variant datasets. Nonetheless, segmentation models developed on SAM remain constrained by the inherent limitations of RGB sensors, particularly in scenarios characterized by complex lighting conditions and high-speed motion. Motivated by these observations, a natural recourse is to adapt SAM to additional visual modalities without compromising its robust generalizability. To achieve this, we introduce a lightweight SAM-Event-Adapter (SE-Adapter) module, which incorporates event camera data into a cross-modal learning architecture based on SAM, with only limited tunable parameters incremental. Capitalizing on the high dynamic range and temporal resolution afforded by event cameras, our proposed multi-modal Event-RGB learning architecture effectively augments the performance of semantic segmentation tasks. In addition, we propose a novel paradigm for representing event data in a patch format compatible with transformer-based models, employing multi-spatiotemporal scale encoding to efficiently extract motion and semantic correlations from event representations. Exhaustive empirical evaluations conducted on the DSEC-Semantic and DDD17 datasets provide validation of the effectiveness and rationality of our proposed approach. © 2024 IEEE.

关键词:

Semantic Segmentation Image coding

作者机构:

  • [ 1 ] [Yao, Bowen]Beijing University of Technology, College of Computer Science, Beijing, China
  • [ 2 ] [Deng, Yongjian]Beijing University of Technology, College of Computer Science, Beijing, China
  • [ 3 ] [Liu, Yuhan]Beijing University of Technology, College of Computer Science, Beijing, China
  • [ 4 ] [Chen, Hao]Key Lab of Computer Network and Information Integration (Southeast University), Ministry of Education, Nanjing, China
  • [ 5 ] [Li, Youfu]City University of Hong Kong, Department of Mechanical Engineering, Kowloon, Hong Kong
  • [ 6 ] [Yang, Zhen]Beijing University of Technology, College of Computer Science, Beijing, China

通讯作者信息:

电子邮件地址:

查看成果更多字段

相关关键词:

相关文章:

来源 :

ISSN: 1050-4729

年份: 2024

页码: 9093-9100

语种: 英文

被引次数:

WoS核心集被引频次:

SCOPUS被引频次: 3

ESI高被引论文在榜: 0 展开所有

万方被引频次:

中文被引频次:

近30日浏览量: 2

归属院系:

在线人数/总访问数:593/4948721
地址:北京工业大学图书馆(北京市朝阳区平乐园100号 邮编:100124) 联系我们:010-67392185
版权所有:北京工业大学图书馆 站点建设与维护:北京爱琴海乐之技术有限公司