• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Deng, Sinuo (Deng, Sinuo.) | Wu, Lifang (Wu, Lifang.) | Shi, Ge (Shi, Ge.) | Xing, Lehao (Xing, Lehao.) | Jian, Meng (Jian, Meng.) | Xiang, Ye (Xiang, Ye.) | Dong, Ruihai (Dong, Ruihai.)

Indexed by:

EI Scopus SCIE

Abstract:

Image emotion classification (IEC) aims to extract the abstract emotions evoked in images. Recently, language-supervised methods such as contrastive language-image pretraining (CLIP) have demonstrated superior performance in image understanding. However, the underexplored task of IEC presents three major challenges: a tremendous training objective gap between pretraining and IEC, shared suboptimal prompts, and invariant prompts for all instances. In this study, we propose a general framework that effectively exploits the language-supervised CLIP method for the IEC task. First, a prompt-tuning method that mimics the pretraining objective of CLIP is introduced, to exploit the rich image and text semantics associated with CLIP. Subsequently, instance-specific prompts are automatically composed, conditioning them on the categories and image content of instances, diversifying the prompts, and thus avoiding suboptimal problems. Evaluations on six widely used affective datasets show that the proposed method significantly outperforms state-of-the-art methods (up to 9.29% accuracy gain on the EmotionROI dataset) on IEC tasks with only a few trained parameters. The code is publicly available at https://github.com/dsn0w/PT-DPC/for research purposes.

Keyword:

multimodal learning pretraining model prompt tuning image emotion analysis

Author Community:

  • [ 1 ] [Deng, Sinuo]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
  • [ 2 ] [Wu, Lifang]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
  • [ 3 ] [Shi, Ge]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
  • [ 4 ] [Xing, Lehao]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
  • [ 5 ] [Jian, Meng]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
  • [ 6 ] [Xiang, Ye]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
  • [ 7 ] [Dong, Ruihai]Univ Coll Dublin, Insight Ctr Data Analyt, Dublin D04 V1W8, Ireland

Reprint Author's Address:

  • [Shi, Ge]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China;;

Show more details

Related Keywords:

Related Article:

Source :

COMPUTATIONAL VISUAL MEDIA

ISSN: 2096-0433

Year: 2024

Issue: 6

Volume: 10

Page: 1169-1183

Cited Count:

WoS CC Cited Count: 11

SCOPUS Cited Count: 14

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 1

Affiliated Colleges:

Online/Total:799/5289562
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.