A Mask Estimation Method Integrating Data Field Model for Speech Enhancement - Details

Author：

Wang, Xianyun (Wang, Xianyun.) | Bao, Changchun (Bao, Changchun.) (Scholars：鲍长春) | Bao, Feng (Bao, Feng.)

Indexed by：

CPCI-S Scopus

Abstract：

In　most　approaches　based　on　computational　auditory　scene　analysis　(CASA),　the　ideal　binary　mask　(IBM)　is　often　used　for　noise　reduction.　However,　it　is　almost　impossible　to　obtain　the　IBM　result.　The　error　in　IBM　estimation　may　greatly　violate　smooth　evolution　nature　of　speech　because　of　the　energy　absence　in　many　speech-dominated　time-frequency　(T-F)　units.　To　reduce　the　error,　the　ideal　ratio　mask　(IRM)　via　modeling　the　spatial　dependencies　of　speech　spectrum　is　used　as　an　optimal　target　mask　because　the　predictive　ratio　mask　is　less　sensitive　to　the　error　than　the　predictive　binary　mask.　In　this　paper,　we　introduce　a　data　field　(DF)　to　model　the　spatial　dependencies　of　the　cochleagram　for　obtaining　the　ratio　mask.　Firstly,　initial　T-F　units　of　noise　and　speech　are　obtained　from　noisy　speech.　Then　we　can　calculate　the　forms　of　the　potentials　of　noise　and　speech.　Subsequently,　their　optimal　potentials　which　reflect　their　respective　distribution　of　potential　field　are　obtained　by　the　optimal　influence　factors　of　speech　and　noise.　Finally,　we　exploit　the　potentials　of　speech　and　noise　to　obtain　the　ratio　mask.　Experimental　results　show　that　the　proposed　method　can　obtain　a　better　performance　than　the　reference　methods　in　speech　quality.

Keyword：

Speech enhancement Ratio mask CASA Data field

Author Community：

[ 1 ] [Wang, Xianyun]Beijing Univ Technol, Fac Informat Technol, Speech & Audio Signal Proc Lab, Beijing 100124, Peoples R China
[ 2 ] [Bao, Changchun]Beijing Univ Technol, Fac Informat Technol, Speech & Audio Signal Proc Lab, Beijing 100124, Peoples R China
[ 3 ] [Bao, Feng]Univ Auckland, Dept Elect & Comp Engn, Auckland 1142, New Zealand

Reprint Author's Address：

[Wang, Xianyun]Beijing Univ Technol, Fac Informat Technol, Speech & Audio Signal Proc Lab, Beijing 100124, Peoples R China

Email：

b201402001@emails.bjut.edu.cn |
baochch@bjut.edu.cn |
fbao026@aucklanduni.ac.nz

Show more details

Related Keywords：

DNN-Based Speech Enhancement via Integrating NMF and CASA
2018，6th International Conference on Audio, Language and Image Processing, ICALIP 2018
A new cost function for DNN-based speech enhancement combining NMF and casa
2018，14th IEEE International Conference on Signal Processing, ICSP 2018
Beamforming-based Speech Enhancement based on Optimal Ratio Mask
2019，IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC)
IRM WITH PHASE PARAMETERIZATION FOR SPEECH ENHANCEMENT
2019，IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

Source ：

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION

ISSN： 2308-457X

Year： 2017

Page： 1904-1908

Language： English

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count： 2

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 1

Affiliated Colleges：

信息学部

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to