MixPrompt: Enhancing Generalizability and Adversarial Robustness for Vision-Language Models via Prompt Fusion - Details

Author：

Indexed by：

CPCI-S EI Scopus

Abstract：

Pretrained　Vision-Language　Models　(VLMs)　like　CLIP　have　exhibited　remarkable　capacities　across　downstream　tasks,　while　their　image　encoders　are　vulnerable　to　adversarial　examples.　A　recently　introduced　lightweight　approach,　termed　Adversarial　Prompt　Tuning　(AdvPT),　utilizes　adversarial　examples　for　training　learnable　prompts,　enhancing　the　adversarial　robustness　of　VLMs　solely　through　manipulation　of　textual　inputs.　However,　the　static　prompts　learned　from　AdvPT　overfit　base　classes　observed　during　training,　compromising　the　model＇s　generalizability.　In　this　paper,　we　propose　a　conditional　Adversarial　Prompt　Tuning　method,　which　extends　AdvPT　by　further　learning　a　network　to　generate　for　each　input　a　specific　prompt.　The　dynamic　prompts　enhance　the　generalizability　of　VLMs　on　unseen　classes.　Furthermore,　since　VLMs　are　inherently　powerful　generalizers,　we　try　to　incorporate　the　manual　prompts　used　by　VLMs　in　the　testing　phase　to　further　improve　the　generalizability　of　the　model.　Extensive　experiments　on　8　datasets　demonstrate　that　our　prompt　fusion　based　method　significantly　outperforms　AdvPT　on　unseen　classes,　enhancing　the　generalizability　and　adversarial　robustness　of　VLMs　simultaneously.

Keyword：

Generalizability Prompt tuning Adversarial robustness Vision-Language model

Author Community：

[ 1 ] [Fan, Hao]Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
[ 2 ] [Gao, Chenlong]Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
[ 3 ] [Fan, Hao]Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China
[ 4 ] [Li, Yong]Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China
[ 5 ] [Tian, Rui]Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China
[ 6 ] [Chen, Yunli]Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China
[ 7 ] [Ma, Zhaoyang]Beijing Jiaotong Univ, Sch Comp & Informat Technol, Beijing, Peoples R China

Reprint Author's Address：

[Gao, Chenlong]Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China;;

Email：

gaochenlong@ict.ac.cn

Show more details

Related Keywords：

Expand Prompt Verbalizer by Extracting Knowledge for Chinese Text Classification
2023，2023 International Conference on Computer, Artificial Intelligence, and Control Engineering, CAICE 2023
Learning to compose diversified prompts for image emotion classification
2024，COMPUTATIONAL VISUAL MEDIA
SimEmotion: A Simple Knowledgeable Prompt Tuning Method for Image Emotion Classification
2022，DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2022, PT III
Dynamic Task Planning: An Integrated Approach with Scene Relation Perception and Knowledge Graphs
2024，36th Chinese Control and Decision Conference, CCDC 2024

Source ：

ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IX, ICIC 2024

ISSN： 0302-9743

Year： 2024

Volume： 14870

Page： 328-339

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 1

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to