Self-Distillation With Augmentation in Feature Space - Details

Author：

Xu, Kai (Xu, Kai.) | Wang, Lichun (Wang, Lichun.) (Scholars：王立春) | Li, Shuang (Li, Shuang.) | Xin, Jianjia (Xin, Jianjia.) | Yin, Baocai (Yin, Baocai.)

Indexed by：

EI Scopus SCIE

Abstract：

Compared　with　traditional　knowledge　distillation,　self-distillation　does　not　require　a　pre-trained　teacher　network,　which　is　more　concise.　Among　them,　data　augmentation-based　methods　provide　an　elegant　solution　without　modifying　the　network　structure　or　additional　memory　consumption.　However,　when　employing　data　augmentation　in　the　input　space,　the　forward　propagations　for　augmented　data　bring　additional　computation　costs　and　the　augmentation　methods　need　to　be　adaptive　to　the　modality　of　input　data.　Meanwhile,　we　note　that　from　a　generalization　perspective,　under　the　condition　of　being　able　to　distinguish　from　other　classes,　a　dispersed　intra-class　feature　distribution　is　superior　to　compact　intra-class　feature　distribution,　especially　for　categories　with　larger　sample　differences.　Based　on　the　above　considerations,　this　paper　proposes　a　feature　augmentation-based　self-distillation　method　(FASD)　based　on　the　idea　of　feature　extrapolation.　For　each　source　feature,　two　augmentations　are　generated　by　subtraction　between　features.　The　one　is　subtracting　the　temporary　class　center　computed　with　samples　belonging　to　the　same　category,　and　another　one　is　subtracting　a　sample　feature　belonging　to　other　categories　with　the　closest　distance.　Then,　the　predicted　outputs　of　the　augmented　features　are　constrained　to　be　consistent　with　that　of　the　source　feature.　The　consistent　constraint　on　the　previous　augmented　feature　expands　the　learned　class　feature　distribution,　leading　to　greater　overlap　with　the　unknown　feature　distribution　of　test　samples,　thereby　improving　the　generalization　performance　of　the　network.　The　consistent　constraint　on　the　latter　augmented　feature　increases　the　distance　between　samples　from　different　categories,　which　enhances　the　distinguishability　between　categories.　Experimental　results　on　image　classification　task　demonstrate　the　effectiveness　and　efficiency　of　the　proposed　method.　Meanwhile,　experiments　on　text　and　audio　tasks　prove　the　universality　of　the　method　for　classification　tasks　with　different　modalities.

Keyword：

Knowledge distillation classification task Training Predictive models generalization performance feature augmentation Knowledge engineering Feature extraction self-distillation Extrapolation Data augmentation Task analysis

Author Community：

[ 1 ] [Xu, Kai]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[ 2 ] [Wang, Lichun]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[ 3 ] [Xin, Jianjia]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[ 4 ] [Yin, Baocai]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[ 5 ] [Li, Shuang]Beijing Informat Sci & Technol Univ, Sch Automat, Beijing 100192, Peoples R China

Reprint Author's Address：

[Wang, Lichun]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China;;

Email：

xukai@emails.bjut.edu.cn |
wanglc@bjut.edu.cn |
shuangli@bistu.edu.cn |
xinjianjia@emails.bjut.edu.cn |
ybc@bjut.edu.cn

Show more details

Related Keywords：

Learning From Teacher's Failure: A Reflective Learning Paradigm for Knowledge Distillation
2024，IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
Self-supervised knowledge distillation for complementary label learning
2022，NEURAL NETWORKS
How to identify pollen like a palynologist: A prior knowledge-guided deep feature learning for real-world pollen classification
2024，EXPERT SYSTEMS WITH APPLICATIONS
Using Distillation to Improve Network Performance after Pruning and Quantization
2019，2nd International Conference on Machine Learning and Machine Intelligence (MLMI)

Source ：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

ISSN： 1051-8215

Year： 2024

Issue： 10

Volume： 34

Page： 9578-9590

8 . 4 0 0

JCR@2022

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count： 1

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 1

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to