RS-CAE-Based AR-Wiener Filtering and Harmonic Recovery for Speech Enhancement - Details

Author：

Yang, Yan (Yang, Yan.) | Bao, Changchun (Bao, Changchun.) (Scholars：鲍长春)

Indexed by：

EI Scopus SCIE

Abstract：

By　taking　into　account　temporal　correlation　of　speech　feature.　In　this　paper,　a　novel　structure　of　convolutional　Auto　Encoder　(CAE)　was　proposed.　In　this　structure,　the　historical　output　of　the　CAE　was　fed　into　a　CAE　stack　recurrently.　We　name　this　structure　as　Recurrent　Stack　Convolutional　Auto　Encoder　(RS-CAE).　In　the　training　stage,　the　training　feature　maps　of　the　RS-CAE　comprise　of　log　power　spectrum　(LPS)　of　noisy　speech　and　an　additional　feature　map　derived　from　the　LPS　of　the　enhanced　speech　in　the　history.　In　this　way,　the　temporal　correlation　is　incorporated　as　much　as　possible　in　the　RS-CAE.　The　training　target　is　a　concatenated　vector　of　auto-regressive　(AR)　model　parameters　of　speech　and　noise.　At　online　stage,　the　LPS　of　noisy　speech　and　the　LPS　of　the　enhanced　speech　from　the　history　make　up　input　feature　maps　together.　The　outputs　of　the　RS-CAE　are　the　AR　model　parameters　of　speech　and　noise,　which　are　used　to　construct　the　AR-Wiener　filter.　Because　the　estimated　AR　model　parameters　are　not　completely　accurate　and　some　harmonics　may　be　lost　in　the　enhanced　speech,　the　codebook-based　harmonic　recovery　technique　was　proposed　to　reconstruct　harmonic　structure　of　the　enhanced　speech.　The　test　results　confirmed　that　the　proposed　method　achieved　better　performance　compared　with　some　existing　approaches.

Keyword：

Speech enhancement codebook-based harmonic recovery RS-CAE temporal correlation

Author Community：

[ 1 ] [Yang, Yan]Beijing Univ Technol, Fac Informat Technol, Speech & Audio Signal Proc Lab, Beijing 100124, Peoples R China
[ 2 ] [Bao, Changchun]Beijing Univ Technol, Fac Informat Technol, Speech & Audio Signal Proc Lab, Beijing 100124, Peoples R China

Reprint Author's Address：

鲍长春
[Bao, Changchun]Beijing Univ Technol, Fac Informat Technol, Speech & Audio Signal Proc Lab, Beijing 100124, Peoples R China

Email：

yangyan00800@emails.bjut.edu.cn |
baochch@bjut.edu.cn

Show more details

Related Keywords：

A speech enhancement algorithm based on β-order GARCH model
2013，2013 IEEE China Summit and International Conference on Signal and Information Processing, ChinaSIP 2013
A loss with mixed penalty for speech enhancement generative adversarial network
2019，2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
A new cost function for DNN-based speech enhancement combining NMF and casa
2018，14th IEEE International Conference on Signal Processing, ICSP 2018
Codebook-driven speech enhancement using DNN and harmonic emphasis
2017，9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017

Source ：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

ISSN： 2329-9290

Year： 2019

Issue： 11

Volume： 27

Page： 1752-1762

5 . 4 0 0

JCR@2022

ESI Discipline： ENGINEERING;

ESI HC Threshold：136

JCR Journal Grade：1

Cited Count：

WoS CC Cited Count： 6

SCOPUS Cited Count： 11

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 1

Affiliated Colleges：

信息学部

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to