• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Guo, Kan (Guo, Kan.) | Tian, Daxin (Tian, Daxin.) | Hu, Yongli (Hu, Yongli.) | Lin, Chunmian (Lin, Chunmian.) | Sun, Yanfeng (Sun, Yanfeng.) | Zhou, Jianshan (Zhou, Jianshan.) | Duan, Xuting (Duan, Xuting.) | Gao, Junbin (Gao, Junbin.) | Yin, Baocai (Yin, Baocai.)

Indexed by:

EI Scopus

Abstract:

Traffic video question answering (TrafficVQA) constitutes a specialized VideoQA task designed to enhance the basic comprehension and intricate reasoning capacities of videos, specifically focusing on traffic events. Recent VideoQA models employ pretrained visual and textual encoder models to bridge the feature space gap between visual and textual data. However, in addressing the unique challenges inherent to the TrafficVQA task, three pivotal issues must be addressed: (i) Dimension Gap: Between the pretrained image (appearance feature) and video (motion feature) models, there exists a conspicuous dimension difference in static and dynamic visual data; (ii) Scene Gap: The common real-world datasets and the traffic event datasets differ in visual scene content; (iii) Modality Gap: A pronounced feature distribution discrepancy emerges between traffic video and text data. To alleviate these challenges, we introduce the coarse-fine multimodal contrastive alignment network (CFMMC-Align). This model leverages sequence-level and token-level multimodal features, grounded in an unsupervised visual multimodal contrastive loss to mitigate dimension and scene gaps and a supervised visual-textual contrastive loss to alleviate modality discrepancies. Finally, the model is validated on the challenging public TrafficVQA dataset SUTD-TrafficQA and outperforms the state-of-the-art method by a substantial margin (50.2% compared to 46.0%). The code is available at https://github.com/guokan987/CFMMC-Align. © 2024 IEEE.

Keyword:

Job analysis Semantics

Author Community:

  • [ 1 ] [Guo, Kan]Beihang University, Beijing Advanced Innovation Center for Big Data and Brain Computing, Beijing Key Laboratory for Cooperative Vehicle Infrastructure Systems and Safety Control, School of Transportation Science and Engineering, Beijing; 100191, China
  • [ 2 ] [Tian, Daxin]Beihang University, Beijing Advanced Innovation Center for Big Data and Brain Computing, Beijing Key Laboratory for Cooperative Vehicle Infrastructure Systems and Safety Control, School of Transportation Science and Engineering, Beijing; 100191, China
  • [ 3 ] [Hu, Yongli]Beijing University of Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Faculty of Information Technology, Beijing; 100124, China
  • [ 4 ] [Lin, Chunmian]Beihang University, Beijing Advanced Innovation Center for Big Data and Brain Computing, Beijing Key Laboratory for Cooperative Vehicle Infrastructure Systems and Safety Control, School of Transportation Science and Engineering, Beijing; 100191, China
  • [ 5 ] [Sun, Yanfeng]Beijing University of Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Faculty of Information Technology, Beijing; 100124, China
  • [ 6 ] [Zhou, Jianshan]Beihang University, Beijing Advanced Innovation Center for Big Data and Brain Computing, Beijing Key Laboratory for Cooperative Vehicle Infrastructure Systems and Safety Control, School of Transportation Science and Engineering, Beijing; 100191, China
  • [ 7 ] [Duan, Xuting]Beihang University, Beijing Advanced Innovation Center for Big Data and Brain Computing, Beijing Key Laboratory for Cooperative Vehicle Infrastructure Systems and Safety Control, School of Transportation Science and Engineering, Beijing; 100191, China
  • [ 8 ] [Gao, Junbin]University of Sydney Business School, University of Sydney, Discipline of Business Analytics, Sydney; NSW; 2006, Australia
  • [ 9 ] [Yin, Baocai]Beijing University of Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Faculty of Information Technology, Beijing; 100124, China

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Related Article:

Source :

IEEE Transactions on Circuits and Systems for Video Technology

ISSN: 1051-8215

Year: 2024

Issue: 11

Volume: 34

Page: 10538-10550

8 . 4 0 0

JCR@2022

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count: 6

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 1

Affiliated Colleges:

Online/Total:347/5994298
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.