Hierarchical Multi-Modal Prompting Transformer for Multi-Modal Long Document Classification - Details

Author：

Liu, Tengfei (Liu, Tengfei.) | Hu, Yongli (Hu, Yongli.) | Gao, Junbin (Gao, Junbin.) | Sun, Yanfeng (Sun, Yanfeng.) | Yin, Baocai (Yin, Baocai.) (Scholars：尹宝才)

Indexed by：

EI Scopus SCIE

Abstract：

In　the　context　of　long　document　classification　(LDC),　effectively　utilizing　multi-modal　information　encompassing　texts　and　images　within　these　documents　has　not　received　adequate　attention.　This　task　showcases　several　notable　characteristics.　Firstly,　the　text　possesses　an　implicit　or　explicit　hierarchical　structure　consisting　of　sections,　sentences,　and　words.　Secondly,　the　distribution　of　images　is　dispersed,　encompassing　various　types　such　as　highly　relevant　topic　images　and　loosely　related　reference　images.　Lastly,　intricate　and　diverse　relationships　exist　between　images　and　text　at　different　levels.　To　address　these　challenges,　we　propose　a　novel　approach　called　Hierarchical　Multi-modal　Prompting　Transformer　(HMPT).　Our　proposed　method　constructs　the　uni-modal　and　multi-modal　transformers　at　both　the　section　and　sentence　levels,　facilitating　effective　interaction　between　features.　Notably,　we　design　an　adaptive　multi-scale　multi-modal　transformer　tailored　to　capture　the　multi-granularity　correlations　between　sentences　and　images.　Additionally,　we　introduce　three　different　types　of　shared　prompts,　i.e.,　shared　section,　sentence,　and　image　prompts,　as　bridges　connecting　the　isolated　transformers,　enabling　seamless　information　interaction　across　different　levels　and　modalities.　To　validate　the　model　performance,　we　conducted　experiments　on　two　newly　created　and　two　publicly　available　multi-modal　long　document　datasets.　The　obtained　results　show　that　our　method　outperforms　state-of-the-art　single-modality　and　multi-modality　classification　methods.

Keyword：

prompt learning Feature extraction Adaptation models Task analysis adaptive multi-scale multi-modal transformer Visualization Computational modeling multi-modal transformer Transformers Circuits and systems Multi-modal long document classification

Author Community：

[ 1 ] [Liu, Tengfei]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[ 2 ] [Liu, Tengfei]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing, NSW 2006, Peoples R China
[ 3 ] [Hu, Yongli]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[ 4 ] [Hu, Yongli]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing, NSW 2006, Peoples R China

Reprint Author's Address：

[Hu, Yongli]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China;;

Email：

tfliu@emails.bjut.edu.cn |
huyongli@bjut.edu.cn |
junbin.gao@sydney.edu.au |
yfsun@bjut.edu.cn |
ybc@bjut.edu.cn

Show more details

Related Keywords：

MHSA-Net: Multihead Self-Attention Network for Occluded Person Re-Identification
2022，IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
Disentangled Cross-Modal Transformer for RGB-D Salient Object Detection and Beyond
2024，IEEE TRANSACTIONS ON IMAGE PROCESSING
Hierarchical Multi-Granularity Interaction Graph Convolutional Network for Long Document Classification
2024，IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
Porn Streamer Recognition in Live Video Streaming via Attention-Gated Multimodal Deep Features
2020，IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

Source ：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

ISSN： 1051-8215

Year： 2024

Issue： 7

Volume： 34

Page： 6376-6390

8 . 4 0 0

JCR@2022

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count： 7

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 0

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to