Indexed by:
Abstract:
In most approaches based on computational auditory scene analysis (CASA), the ideal binary mask (IBM) is often used for noise reduction. However, it is almost impossible to obtain the IBM result. The error in IBM estimation may greatly violate smooth evolution nature of speech because of the energy absence in many speech-dominated time-frequency (T-F) units. To reduce the error, the ideal ratio mask (IRM) via modeling the spatial dependencies of speech spectrum is used as an optimal target mask because the predictive ratio mask is less sensitive to the error than the predictive binary mask. In this paper, we introduce a data field (DF) to model the spatial dependencies of the cochleagram for obtaining the ratio mask. Firstly, initial T-F units of noise and speech are obtained from noisy speech. Then we can calculate the forms of the potentials of noise and speech. Subsequently, their optimal potentials which reflect their respective distribution of potential field are obtained by the optimal influence factors of speech and noise. Finally, we exploit the potentials of speech and noise to obtain the ratio mask. Experimental results show that the proposed method can obtain a better performance than the reference methods in speech quality.
Keyword:
Reprint Author's Address:
Source :
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION
ISSN: 2308-457X
Year: 2017
Page: 1904-1908
Language: English
Cited Count:
WoS CC Cited Count: 0
SCOPUS Cited Count: 2
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 0
Affiliated Colleges: