CFMMC-Align: Coarse-Fine Multi-Modal Contrastive Alignment Network for Traffic Event Video Question Answering - Details

Author：

Indexed by：

EI Scopus

Abstract：

Traffic　video　question　answering　(TrafficVQA)　constitutes　a　specialized　VideoQA　task　designed　to　enhance　the　basic　comprehension　and　intricate　reasoning　capacities　of　videos,　specifically　focusing　on　traffic　events.　Recent　VideoQA　models　employ　pretrained　visual　and　textual　encoder　models　to　bridge　the　feature　space　gap　between　visual　and　textual　data.　However,　in　addressing　the　unique　challenges　inherent　to　the　TrafficVQA　task,　three　pivotal　issues　must　be　addressed:　(i)　Dimension　Gap:　Between　the　pretrained　image　(appearance　feature)　and　video　(motion　feature)　models,　there　exists　a　conspicuous　dimension　difference　in　static　and　dynamic　visual　data;　(ii)　Scene　Gap:　The　common　real-world　datasets　and　the　traffic　event　datasets　differ　in　visual　scene　content;　(iii)　Modality　Gap:　A　pronounced　feature　distribution　discrepancy　emerges　between　traffic　video　and　text　data.　To　alleviate　these　challenges,　we　introduce　the　coarse-fine　multimodal　contrastive　alignment　network　(CFMMC-Align).　This　model　leverages　sequence-level　and　token-level　multimodal　features,　grounded　in　an　unsupervised　visual　multimodal　contrastive　loss　to　mitigate　dimension　and　scene　gaps　and　a　supervised　visual-textual　contrastive　loss　to　alleviate　modality　discrepancies.　Finally,　the　model　is　validated　on　the　challenging　public　TrafficVQA　dataset　SUTD-TrafficQA　and　outperforms　the　state-of-the-art　method　by　a　substantial　margin　(50.2%　compared　to　46.0%).　The　code　is　available　at　https://github.com/guokan987/CFMMC-Align.　©　2024　IEEE.

Keyword：

Job analysis Semantics

Author Community：

[ 1 ] [Guo, Kan]Beihang University, Beijing Advanced Innovation Center for Big Data and Brain Computing, Beijing Key Laboratory for Cooperative Vehicle Infrastructure Systems and Safety Control, School of Transportation Science and Engineering, Beijing; 100191, China
[ 2 ] [Tian, Daxin]Beihang University, Beijing Advanced Innovation Center for Big Data and Brain Computing, Beijing Key Laboratory for Cooperative Vehicle Infrastructure Systems and Safety Control, School of Transportation Science and Engineering, Beijing; 100191, China
[ 3 ] [Hu, Yongli]Beijing University of Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Faculty of Information Technology, Beijing; 100124, China
[ 4 ] [Lin, Chunmian]Beihang University, Beijing Advanced Innovation Center for Big Data and Brain Computing, Beijing Key Laboratory for Cooperative Vehicle Infrastructure Systems and Safety Control, School of Transportation Science and Engineering, Beijing; 100191, China
[ 5 ] [Sun, Yanfeng]Beijing University of Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Faculty of Information Technology, Beijing; 100124, China
[ 6 ] [Zhou, Jianshan]Beihang University, Beijing Advanced Innovation Center for Big Data and Brain Computing, Beijing Key Laboratory for Cooperative Vehicle Infrastructure Systems and Safety Control, School of Transportation Science and Engineering, Beijing; 100191, China
[ 7 ] [Duan, Xuting]Beihang University, Beijing Advanced Innovation Center for Big Data and Brain Computing, Beijing Key Laboratory for Cooperative Vehicle Infrastructure Systems and Safety Control, School of Transportation Science and Engineering, Beijing; 100191, China
[ 8 ] [Gao, Junbin]University of Sydney Business School, University of Sydney, Discipline of Business Analytics, Sydney; NSW; 2006, Australia
[ 9 ] [Yin, Baocai]Beijing University of Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Faculty of Information Technology, Beijing; 100124, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Location-sensitive resource optimization for profit maximization in distributed data centers
2019，2019 IEEE International Conference on Systems, Man and Cybernetics, SMC 2019
Experimental Research on Submerged Fuel Jet in Common Rail Injector
2019，Journal of Engineering Thermophysics
In cloud, do MTC or HTC service providers benefit from the economies of scale?
2009，2nd ACM Workshop on Many-Task Computing on Grids and Supercomputers 2009, MTAGS '09
Identifying factors for employee retention using computational techniques: an approach to assist the decision-making process
2020，SN Applied Sciences

Source ：

IEEE Transactions on Circuits and Systems for Video Technology

ISSN： 1051-8215

Year： 2024

Issue： 11

Volume： 34

Page： 10538-10550

8 . 4 0 0

JCR@2022

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count： 6

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 1

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to