Learning to compose diversified prompts for image emotion classification - Details

Author：

Indexed by：

EI Scopus SCIE

Abstract：

Image　emotion　classification　(IEC)　aims　to　extract　the　abstract　emotions　evoked　in　images.　Recently,　language-supervised　methods　such　as　contrastive　language-image　pretraining　(CLIP)　have　demonstrated　superior　performance　in　image　understanding.　However,　the　underexplored　task　of　IEC　presents　three　major　challenges:　a　tremendous　training　objective　gap　between　pretraining　and　IEC,　shared　suboptimal　prompts,　and　invariant　prompts　for　all　instances.　In　this　study,　we　propose　a　general　framework　that　effectively　exploits　the　language-supervised　CLIP　method　for　the　IEC　task.　First,　a　prompt-tuning　method　that　mimics　the　pretraining　objective　of　CLIP　is　introduced,　to　exploit　the　rich　image　and　text　semantics　associated　with　CLIP.　Subsequently,　instance-specific　prompts　are　automatically　composed,　conditioning　them　on　the　categories　and　image　content　of　instances,　diversifying　the　prompts,　and　thus　avoiding　suboptimal　problems.　Evaluations　on　six　widely　used　affective　datasets　show　that　the　proposed　method　significantly　outperforms　state-of-the-art　methods　(up　to　9.29%　accuracy　gain　on　the　EmotionROI　dataset)　on　IEC　tasks　with　only　a　few　trained　parameters.　The　code　is　publicly　available　at　https://github.com/dsn0w/PT-DPC/for　research　purposes.

Keyword：

multimodal learning pretraining model prompt tuning image emotion analysis

Author Community：

[ 1 ] [Deng, Sinuo]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 2 ] [Wu, Lifang]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 3 ] [Shi, Ge]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 4 ] [Xing, Lehao]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 5 ] [Jian, Meng]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 6 ] [Xiang, Ye]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 7 ] [Dong, Ruihai]Univ Coll Dublin, Insight Ctr Data Analyt, Dublin D04 V1W8, Ireland

Reprint Author's Address：

[Shi, Ge]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China;;

Email：

shige@bjut.edu.cn

Show more details

Related Keywords：

Expand Prompt Verbalizer by Extracting Knowledge for Chinese Text Classification
2023，2023 International Conference on Computer, Artificial Intelligence, and Control Engineering, CAICE 2023
MixPrompt: Enhancing Generalizability and Adversarial Robustness for Vision-Language Models via Prompt Fusion
2024，ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IX, ICIC 2024
SimEmotion: A Simple Knowledgeable Prompt Tuning Method for Image Emotion Classification
2022，DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2022, PT III
CMed-GPT: Prompt Tuning for Entity-Aware Chinese Medical Dialogue Generation
2024，ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II, PAKDD 2024

Source ：

COMPUTATIONAL VISUAL MEDIA

ISSN： 2096-0433

Year： 2024

Issue： 6

Volume： 10

Page： 1169-1183

Cited Count：

WoS CC Cited Count： 11

SCOPUS Cited Count： 24

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 4

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to