Speech Emotion Recognition Based on Feature Fusion - Details

Author：

Shen, Qi (Shen, Qi.) | Chen, Guanggen (Chen, Guanggen.) | Chang, Lin (Chang, Lin.)

Indexed by：

CPCI-S

Abstract：

Speech　emotion　recognition　is　mainly　based　on　the　differences　of　characteristics　between　different　emotions.　The　traditional　recognition　method　is　based　on　the　manual　extracted　features,　such　as　MFCC　and　LPCC,　etc.,　and　also　achieved　well.　But　it　is　unclear　what　kind　of　feature　are　able　to　reflect　the　characteristics　of　human　emotion　from　speech.　With　Convolution　Neural　Network　(CNN)　shows　strong　ability　in　the　field　of　image　classification,　attracting　more　researchers　to　apply　CNN　to　the　learning　of　the　spectrogram　feature.　However,　the　study　of　speech　emotion　either　according　to　the　characteristics　of　the　traditional　manual　extraction　or　completely　dependent　on　spectrogram　of　speech.　There　is　still　no　combination　of　traditional　features　and　spectrogram　feature.　In　this　paper,　we　propose　a　fusion　neural　network　model　combining　the　characteristics　of　traditional　with　spectrogram　features.　This　multimodal　CNN　is　trained　with　two　stages.　First,　two　CNN　models　pre-trained　are　fine-tuning　respectively　on　the　corresponding　labeled　audio　datasets.　Second,　the　outputs　of　the　two　CNN　models　are　connected　to　a　fusion　network　of　fully-connected　layers.　The　fusion　network　is　trained　to　obtain　a　joint　feature　representation　for　emotion　recognition.　From　the　recognition　results　of　emotional　speech　database,　the　proposed　algorithm　has　higher　speech　emotion　recognition　rate　and　robustness.

Keyword：

speech emotion recognition convolution neural network feature fusion

Author Community：

[ 1 ] [Shen, Qi]Beijing Univ Technol, Sch Informat Technol, Beijing 100124, Peoples R China
[ 2 ] [Chen, Guanggen]Beijing Univ Technol, Sch Software, Beijing 100124, Peoples R China
[ 3 ] [Chang, Lin]Beijing Univ Technol, Sch Software, Beijing 100124, Peoples R China

Reprint Author's Address：

[Shen, Qi]Beijing Univ Technol, Sch Informat Technol, Beijing 100124, Peoples R China

Email：

shenq@bjut.edu.cn |
stick360@163.com |
changlin@163.com

Show more details

Related Keywords：

Crowd density estimation via multi-scale convolutional neural network in single-image
2019，2nd World Robot Conference (WRC) / Symposium on Advanced Robotics and Automation (WRC SARA)
Road Scene Segmentation Based on Multi-scale Attention Mechanism
2022，5th IEEE Advanced Information Management, Communicates, Electronic and Automation Control Conference, IMCEC 2022
A Ship Rotation Detection Model in Remote Sensing Images Based on Feature Fusion Pyramid Network and Deep Reinforcement Learning
2018，REMOTE SENSING
Crowd density estimation via multi-scale convolutional neural network in single-image
2019，2nd World Robot Conference Symposium on Advanced Robotics and Automation, WRC SARA 2019

Source ：

PROCEEDINGS OF THE 2017 2ND INTERNATIONAL CONFERENCE ON MATERIALS SCIENCE, MACHINERY AND ENERGY ENGINEERING (MSMEE 2017)

ISSN： 2352-5401

Year： 2017

Volume： 123

Page： 1071-1074

Language： English

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 0

Affiliated Colleges：

信息学部

Get Fulltext

Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to